Assessing the Portability of Parameter Matrices Trained by
Parameter-Efficient Finetuning Methods
- URL: http://arxiv.org/abs/2401.14228v1
- Date: Thu, 25 Jan 2024 15:11:07 GMT
- Title: Assessing the Portability of Parameter Matrices Trained by
Parameter-Efficient Finetuning Methods
- Authors: Mohammed Sabry and Anya Belz
- Abstract summary: We investigate the inverse: porting whole functional modules that encode task-specific knowledge from one model to another.
We designed a study comprising 1,440 training/testing runs to test the portability of modules trained by parameter-efficient finetuning (PEFT) techniques.
We find that the ported modules far outperform the two alternatives tested, but that there are interesting performance differences between the four PEFT techniques.
- Score: 6.653947064461629
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As the cost of training ever larger language models has grown, so has the
interest in reusing previously learnt knowledge. Transfer learning methods have
shown how reusing non-task-specific knowledge can help in subsequent
task-specific learning. In this paper, we investigate the inverse: porting
whole functional modules that encode task-specific knowledge from one model to
another. We designed a study comprising 1,440 training/testing runs to test the
portability of modules trained by parameter-efficient finetuning (PEFT)
techniques, using sentiment analysis as an example task. We test portability in
a wide range of scenarios, involving different PEFT techniques and different
pretrained host models, among other dimensions. We compare the performance of
ported modules with that of equivalent modules trained (i) from scratch, and
(ii) from parameters sampled from the same distribution as the ported module.
We find that the ported modules far outperform the two alternatives tested, but
that there are interesting performance differences between the four PEFT
techniques. We conclude that task-specific knowledge in the form of
structurally modular sets of parameters as produced by PEFT techniques is
highly portable, but that degree of success depends on type of PEFT and on
differences between originating and receiving pretrained models.
Related papers
- Learning to Route for Dynamic Adapter Composition in Continual Learning with Language Models [56.93608812478369]
We present L2R, a method that isolates the training of new PEFT modules to ensure their task specialization.
L2R then learns to compose the learned modules by training a network of routers that leverages a small memory containing examples of previously seen tasks.
Our results demonstrate that L2R provides an effective composition of PEFT modules, leading to improved generalization and performance compared to other methods.
arXiv Detail & Related papers (2024-08-16T23:57:29Z) - Is Modularity Transferable? A Case Study through the Lens of Knowledge Distillation [59.37775534633868]
We present an extremely straightforward approach to transferring pre-trained, task-specific PEFT modules between same-family PLMs.
We also propose a method that allows the transfer of modules between incompatible PLMs without any change in the inference complexity.
arXiv Detail & Related papers (2024-03-27T17:50:00Z) - MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [73.81862342673894]
Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks.
transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks.
We conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection.
Our models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection.
arXiv Detail & Related papers (2024-03-20T09:17:22Z) - Mixture-of-Linguistic-Experts Adapters for Improving and Interpreting
Pre-trained Language Models [22.977852629450346]
We propose a method that combines two popular research areas by injecting linguistic structures into pre-trained language models.
In our approach, parallel adapter modules encoding different linguistic structures are combined using a novel Mixture-of-Linguistic-Experts architecture.
Our experiment results show that our approach can outperform state-of-the-art PEFT methods with a comparable number of parameters.
arXiv Detail & Related papers (2023-10-24T23:29:06Z) - Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective [106.92016199403042]
We empirically investigate knowledge transfer from larger to smaller models through a parametric perspective.
We employ sensitivity-based techniques to extract and align knowledge-specific parameters between different large language models.
Our findings highlight the critical factors contributing to the process of parametric knowledge transfer.
arXiv Detail & Related papers (2023-10-17T17:58:34Z) - UniPELT: A Unified Framework for Parameter-Efficient Language Model
Tuning [64.638804236566]
We propose a unified framework, UniPELT, which incorporates different PELT methods as submodules and learns to activate the ones that best suit the current data or task setup.
Remarkably, on the GLUE benchmark, UniPELT consistently achieves 13pt gains compared to the best individual PELT method that it incorporates and even outperforms fine-tuning under different setups.
arXiv Detail & Related papers (2021-10-14T17:40:08Z) - Towards a Unified View of Parameter-Efficient Transfer Learning [108.94786930869473]
Fine-tuning large pre-trained language models on downstream tasks has become the de-facto learning paradigm in NLP.
Recent work has proposed a variety of parameter-efficient transfer learning methods that only fine-tune a small number of (extra) parameters to attain strong performance.
We break down the design of state-of-the-art parameter-efficient transfer learning methods and present a unified framework that establishes connections between them.
arXiv Detail & Related papers (2021-10-08T20:22:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.