Related papers: Learning to Route for Dynamic Adapter Composition in Continual Learning with Language Models

Learning to Route for Dynamic Adapter Composition in Continual Learning with Language Models

URL: http://arxiv.org/abs/2408.09053v2
Date: Wed, 30 Oct 2024 01:38:27 GMT
Title: Learning to Route for Dynamic Adapter Composition in Continual Learning with Language Models
Authors: Vladimir Araujo, Marie-Francine Moens, Tinne Tuytelaars,
Abstract summary: We present L2R, a method that isolates the training of new PEFT modules to ensure their task specialization. L2R then learns to compose the learned modules by training a network of routers that leverages a small memory containing examples of previously seen tasks. Our results demonstrate that L2R provides an effective composition of PEFT modules, leading to improved generalization and performance compared to other methods.
Score: 56.93608812478369
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Parameter-efficient fine-tuning (PEFT) methods are increasingly used with pre-trained language models (PLMs) for continual learning (CL). These methods typically involve training a PEFT module for each new task and employing similarity-based selection to route modules during inference. However, they face two major limitations: 1) interference during module training with already learned modules and 2) suboptimal routing when composing modules. In this paper, we present L2R, a method that isolates the training of new PEFT modules to ensure their task specialization. L2R then learns to compose the learned modules by training a network of routers that leverages a small memory containing examples of previously seen tasks. We evaluate our method in two CL setups using various benchmarks. Our results demonstrate that L2R provides an effective composition of PEFT modules, leading to improved generalization and performance compared to other methods.

Related papers

Learning to Chain Operations by Routing Information Through a Global Workspace [3.1614158472531435]
We present a model inspired by the Global Workspace Theory that integrates specialized modules to perform a sequential reasoning task. We evaluate the model's performance on a simple addition task, where two addends must be summed. Our results highlight the potential of architectures inspired by the Global Workspace Theory to enhance deep learning's reasoning capabilities.
arXiv Detail & Related papers (2025-02-28T15:30:55Z)
Towards Modular LLMs by Building and Reusing a Library of LoRAs [64.43376695346538]
We study how to best build a library of adapters given multi-task data. We introduce model-based clustering, MBC, a method that groups tasks based on the similarity of their adapter parameters. To re-use the library, we present a novel zero-shot routing mechanism, Arrow, which enables dynamic selection of the most relevant adapters.
arXiv Detail & Related papers (2024-05-18T03:02:23Z)
Is Modularity Transferable? A Case Study through the Lens of Knowledge Distillation [59.37775534633868]
We present an extremely straightforward approach to transferring pre-trained, task-specific PEFT modules between same-family PLMs. We also propose a method that allows the transfer of modules between incompatible PLMs without any change in the inference complexity.
arXiv Detail & Related papers (2024-03-27T17:50:00Z)
Assessing the Portability of Parameter Matrices Trained by Parameter-Efficient Finetuning Methods [6.653947064461629]
We investigate the inverse: porting whole functional modules that encode task-specific knowledge from one model to another. We designed a study comprising 1,440 training/testing runs to test the portability of modules trained by parameter-efficient finetuning (PEFT) techniques. We find that the ported modules far outperform the two alternatives tested, but that there are interesting performance differences between the four PEFT techniques.
arXiv Detail & Related papers (2024-01-25T15:11:07Z)
SAPT: A Shared Attention Framework for Parameter-Efficient Continual Learning of Large Language Models [71.78800549517298]
Continual learning (CL) ability is vital for deploying large language models (LLMs) in the dynamic world. Existing methods devise the learning module to acquire task-specific knowledge with parameter-efficient tuning (PET) block and the selection module to pick out the corresponding one for the testing input. We propose a novel Shared Attention Framework (SAPT) to align the PET learning and selection via the Shared Attentive Learning & Selection module.
arXiv Detail & Related papers (2024-01-16T11:45:03Z)
Composing Parameter-Efficient Modules with Arithmetic Operations [20.119291936493788]
We propose to compose parameter-efficient modules through linear arithmetic operations in the weight space. Our approach requires emphno additional training and enables highly flexible module composition. We extend our approach to detoxify Alpaca-LoRA, the latest instruction-tuned large language model based on LLaMA.
arXiv Detail & Related papers (2023-06-26T17:33:21Z)
ModuleFormer: Modularity Emerges from Mixture-of-Experts [60.6148988099284]
This paper proposes a new neural network architecture, ModuleFormer, to improve the efficiency and flexibility of large language models. Unlike the previous SMoE-based modular language model, ModuleFormer can induce modularity from uncurated data.
arXiv Detail & Related papers (2023-06-07T17:59:57Z)
Modular Deep Learning [120.36599591042908]
Transfer learning has recently become the dominant paradigm of machine learning. It remains unclear how to develop models that specialise towards multiple tasks without incurring negative interference. Modular deep learning has emerged as a promising solution to these challenges.
arXiv Detail & Related papers (2023-02-22T18:11:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.