Model ensemble instead of prompt fusion: a sample-specific knowledge
transfer method for few-shot prompt tuning
- URL: http://arxiv.org/abs/2210.12587v1
- Date: Sun, 23 Oct 2022 01:33:16 GMT
- Title: Model ensemble instead of prompt fusion: a sample-specific knowledge
transfer method for few-shot prompt tuning
- Authors: Xiangyu Peng, Chen Xing, Prafulla Kumar Choubey, Chien-Sheng Wu,
Caiming Xiong
- Abstract summary: We focus on improving the few-shot performance of prompt tuning by transferring knowledge from soft prompts of source tasks.
We propose Sample-specific Ensemble of Source Models (SESoM)
SESoM learns to adjust the contribution of each source model for each target sample separately when ensembling source model outputs.
- Score: 85.55727213502402
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prompt tuning approaches, which learn task-specific soft prompts for a
downstream task conditioning on frozen pre-trained models, have attracted
growing interest due to its parameter efficiency. With large language models
and sufficient training data, prompt tuning performs comparably to full-model
tuning. However, with limited training samples in few-shot settings, prompt
tuning fails to match the performance of full-model fine-tuning. In this work,
we focus on improving the few-shot performance of prompt tuning by transferring
knowledge from soft prompts of source tasks. Recognizing the good
generalization capabilities of ensemble methods in low-data regime, we first
experiment and show that a simple ensemble of model predictions based on
different source prompts, outperforms existing multi-prompt knowledge transfer
approaches such as source prompt fusion in the few-shot setting. Motivated by
this observation, we further investigate model ensembles and propose
Sample-specific Ensemble of Source Models (SESoM). SESoM learns to adjust the
contribution of each source model for each target sample separately when
ensembling source model outputs. Through this way, SESoM inherits the superior
generalization of model ensemble approaches and simultaneously captures the
sample-specific competence of each source prompt. We conduct experiments across
a diverse set of eight NLP tasks using models of different scales (T5-{base,
large, XL}) and find that SESoM consistently outperforms the existing models of
the same as well as larger parametric scale by a large margin.
Related papers
- EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models [30.68516200579894]
We introduce CM-TTS, a novel architecture grounded in consistency models (CMs)
CM-TTS achieves top-quality speech synthesis in fewer steps without adversarial training or pre-trained model dependencies.
We present a real-time mel-spectrogram generation consistency model, validated through comprehensive evaluations.
arXiv Detail & Related papers (2024-03-31T05:38:08Z) - Merging Multi-Task Models via Weight-Ensembling Mixture of Experts [64.94129594112557]
Merging Transformer-based models trained on different tasks into a single unified model can execute all the tasks concurrently.
Previous methods, exemplified by task arithmetic, have been proven to be both effective and scalable.
We propose to merge most of the parameters while upscaling the Transformer layers to a weight-ensembling mixture of experts (MoE) module.
arXiv Detail & Related papers (2024-02-01T08:58:57Z) - CONTRAST: Continual Multi-source Adaptation to Dynamic Distributions [42.293444710522294]
Continual Multi-source Adaptation to Dynamic Distributions (CONTRAST) is a novel method that optimally combines multiple source models to adapt to the dynamic test data.
We show that the proposed method is able to optimally combine the source models and prioritize updates to the model least prone to forgetting.
arXiv Detail & Related papers (2024-01-04T22:23:56Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z) - Building a Winning Team: Selecting Source Model Ensembles using a
Submodular Transferability Estimation Approach [20.86345962679122]
Estimating the transferability of publicly available pretrained models to a target task has assumed an important place for transfer learning tasks.
We propose a novel Optimal tranSport-based suBmOdular tRaNsferability metric (OSBORN) to estimate the transferability of an ensemble of models to a downstream task.
arXiv Detail & Related papers (2023-09-05T17:57:31Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - One for More: Selecting Generalizable Samples for Generalizable ReID
Model [92.40951770273972]
This paper proposes a one-for-more training objective that takes the generalization ability of selected samples as a loss function.
Our proposed one-for-more based sampler can be seamlessly integrated into the ReID training framework.
arXiv Detail & Related papers (2020-12-10T06:37:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.