Continual Learning via Local Module Composition
- URL: http://arxiv.org/abs/2111.07736v1
- Date: Mon, 15 Nov 2021 13:34:15 GMT
- Title: Continual Learning via Local Module Composition
- Authors: Oleksiy Ostapenko, Pau Rodriguez, Massimo Caccia, Laurent Charlin
- Abstract summary: Local module composition (LMC) is an approach to modular continual learning.
LMC provides each module a local structural component that estimates a module's relevance to the input.
- Score: 11.380264053565082
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modularity is a compelling solution to continual learning (CL), the problem
of modeling sequences of related tasks. Learning and then composing modules to
solve different tasks provides an abstraction to address the principal
challenges of CL including catastrophic forgetting, backward and forward
transfer across tasks, and sub-linear model growth. We introduce local module
composition (LMC), an approach to modular CL where each module is provided a
local structural component that estimates a module's relevance to the input.
Dynamic module composition is performed layer-wise based on local relevance
scores. We demonstrate that agnosticity to task identities (IDs) arises from
(local) structural learning that is module-specific as opposed to the task-
and/or model-specific as in previous works, making LMC applicable to more CL
settings compared to previous works. In addition, LMC also tracks statistics
about the input distribution and adds new modules when outlier samples are
detected. In the first set of experiments, LMC performs favorably compared to
existing methods on the recent Continual Transfer-learning Benchmark without
requiring task identities. In another study, we show that the locality of
structural learning allows LMC to interpolate to related but unseen tasks
(OOD), as well as to compose modular networks trained independently on
different task sequences into a third modular network without any fine-tuning.
Finally, in search for limitations of LMC we study it on more challenging
sequences of 30 and 100 tasks, demonstrating that local module selection
becomes much more challenging in presence of a large number of candidate
modules. In this setting best performing LMC spawns much fewer modules compared
to an oracle based baseline, however, it reaches a lower overall accuracy. The
codebase is available under https://github.com/oleksost/LMC.
Related papers
- Configurable Foundation Models: Building LLMs from a Modular Perspective [115.63847606634268]
A growing tendency to decompose LLMs into numerous functional modules allows for inference with part of modules and dynamic assembly of modules to tackle complex tasks.
We coin the term brick to represent each functional module, designating the modularized structure as customizable foundation models.
We present four brick-oriented operations: retrieval and routing, merging, updating, and growing.
We find that the FFN layers follow modular patterns with functional specialization of neurons and functional neuron partitions.
arXiv Detail & Related papers (2024-09-04T17:01:02Z) - Learning to Route for Dynamic Adapter Composition in Continual Learning with Language Models [56.93608812478369]
We present L2R, a method that isolates the training of new PEFT modules to ensure their task specialization.
L2R then learns to compose the learned modules by training a network of routers that leverages a small memory containing examples of previously seen tasks.
Our results demonstrate that L2R provides an effective composition of PEFT modules, leading to improved generalization and performance compared to other methods.
arXiv Detail & Related papers (2024-08-16T23:57:29Z) - SAPT: A Shared Attention Framework for Parameter-Efficient Continual Learning of Large Language Models [71.78800549517298]
Continual learning (CL) ability is vital for deploying large language models (LLMs) in the dynamic world.
Existing methods devise the learning module to acquire task-specific knowledge with parameter-efficient tuning (PET) block and the selection module to pick out the corresponding one for the testing input.
We propose a novel Shared Attention Framework (SAPT) to align the PET learning and selection via the Shared Attentive Learning & Selection module.
arXiv Detail & Related papers (2024-01-16T11:45:03Z) - CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules [51.82044734879657]
We propose CodeChain, a novel framework for inference that elicits modularized code generation through a chain of self-revisions.
We find that CodeChain can significantly boost both modularity as well as correctness of the generated solutions, achieving relative pass@1 improvements of 35% on APPS and 76% on CodeContests.
arXiv Detail & Related papers (2023-10-13T10:17:48Z) - A Probabilistic Framework for Modular Continual Learning [27.398496741452554]
We develop a modular continuous learning framework, PICLE, to search through the large, discrete space of module compositions.
We show PICLE is the first modular CL algorithm to achieve perceptual, few-shot and latent transfer while scaling well to large search spaces.
arXiv Detail & Related papers (2023-06-11T00:06:57Z) - ModuleFormer: Modularity Emerges from Mixture-of-Experts [60.6148988099284]
This paper proposes a new neural network architecture, ModuleFormer, to improve the efficiency and flexibility of large language models.
Unlike the previous SMoE-based modular language model, ModuleFormer can induce modularity from uncurated data.
arXiv Detail & Related papers (2023-06-07T17:59:57Z) - Modular Deep Learning [120.36599591042908]
Transfer learning has recently become the dominant paradigm of machine learning.
It remains unclear how to develop models that specialise towards multiple tasks without incurring negative interference.
Modular deep learning has emerged as a promising solution to these challenges.
arXiv Detail & Related papers (2023-02-22T18:11:25Z) - Efficient Continual Learning with Modular Networks and Task-Driven
Priors [31.03712334701338]
Existing literature in Continual Learning (CL) has focused on overcoming catastrophic forgetting.
We introduce a new modular architecture, whose modules represent atomic skills that can be composed to perform a certain task.
Our learning algorithm leverages a task-driven prior over the exponential search space of all possible ways to combine modules, enabling efficient learning on long streams of tasks.
arXiv Detail & Related papers (2020-12-23T12:42:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.