Prototypical Fine-tuning: Towards Robust Performance Under Varying Data
Sizes
- URL: http://arxiv.org/abs/2211.13638v1
- Date: Thu, 24 Nov 2022 14:38:08 GMT
- Title: Prototypical Fine-tuning: Towards Robust Performance Under Varying Data
Sizes
- Authors: Yiqiao Jin, Xiting Wang, Yaru Hao, Yizhou Sun, Xing Xie
- Abstract summary: We propose a novel framework for fine-tuning pretrained language models (LM)
Our prototypical fine-tuning approach can automatically adjust the model capacity according to the number of data points and the model's inherent attributes.
- Score: 47.880781811936345
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we move towards combining large parametric models with
non-parametric prototypical networks. We propose prototypical fine-tuning, a
novel prototypical framework for fine-tuning pretrained language models (LM),
which automatically learns a bias to improve predictive performance for varying
data sizes, especially low-resource settings. Our prototypical fine-tuning
approach can automatically adjust the model capacity according to the number of
data points and the model's inherent attributes. Moreover, we propose four
principles for effective prototype fine-tuning towards the optimal solution.
Experimental results across various datasets show that our work achieves
significant performance improvements under various low-resource settings, as
well as comparable and usually better performances in high-resource scenarios.
Related papers
- Under the Hood of Tabular Data Generation Models: the Strong Impact of Hyperparameter Tuning [2.5168710814072894]
This study addresses the practical need for a unified evaluation of models.
We propose a reduced search space for each model that allows for quick optimization.
For most models, large-scale dataset-specific tuning substantially improves performance compared to the original configurations.
arXiv Detail & Related papers (2024-06-18T07:27:38Z) - Feature Protection For Out-of-distribution Generalization [24.072876186625855]
We show that protecting pre-trained features leads to a fine-tuned model more robust to generalization.
We show that protecting pre-trained features leads to a fine-tuned model more robust to OOD generalization.
arXiv Detail & Related papers (2024-05-25T03:00:06Z) - Training Survival Models using Scoring Rules [9.330089124239086]
Survival Analysis provides critical insights for incomplete time-to-event data.
It is also an important example of probabilistic machine learning.
We establish different parametric and non-parametric sub-frameworks that allow different degrees of flexibility.
We show that using our framework, we can recover various parametric models and demonstrate that optimization works equally well when compared to likelihood-based methods.
arXiv Detail & Related papers (2024-03-19T20:58:38Z) - When Parameter-efficient Tuning Meets General-purpose Vision-language
Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique.
Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z) - E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning [55.50908600818483]
Fine-tuning large-scale pretrained vision models for new tasks has become increasingly parameter-intensive.
We propose an Effective and Efficient Visual Prompt Tuning (E2VPT) approach for large-scale transformer-based model adaptation.
Our approach outperforms several state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2023-07-25T19:03:21Z) - Performance Scaling via Optimal Transport: Enabling Data Selection from
Partially Revealed Sources [9.359395812292291]
This paper proposes a framework called or>, which predicts model performance and supports data selection decisions based on partial samples of prospective data sources.
or> significantly improves existing performance scaling approaches in terms of both accuracy of performance inference and computation costs associated with constructing the performance.
Also, or> outperforms by a wide margin in data selection effectiveness compared to a range of other off-the-shelf solutions.
arXiv Detail & Related papers (2023-07-05T17:33:41Z) - Scaling Pre-trained Language Models to Deeper via Parameter-efficient
Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO)
MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts.
Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z) - Building Resilience to Out-of-Distribution Visual Data via Input
Optimization and Model Finetuning [13.804184845195296]
We propose a preprocessing model that learns to optimise input data for a specific target vision model.
We investigate several out-of-distribution scenarios in the context of semantic segmentation for autonomous vehicles.
We demonstrate that our approach can enable performance on such data comparable to that of a finetuned model.
arXiv Detail & Related papers (2022-11-29T14:06:35Z) - Investigating Ensemble Methods for Model Robustness Improvement of Text
Classifiers [66.36045164286854]
We analyze a set of existing bias features and demonstrate there is no single model that works best for all the cases.
By choosing an appropriate bias model, we can obtain a better robustness result than baselines with a more sophisticated model design.
arXiv Detail & Related papers (2022-10-28T17:52:10Z) - Model ensemble instead of prompt fusion: a sample-specific knowledge
transfer method for few-shot prompt tuning [85.55727213502402]
We focus on improving the few-shot performance of prompt tuning by transferring knowledge from soft prompts of source tasks.
We propose Sample-specific Ensemble of Source Models (SESoM)
SESoM learns to adjust the contribution of each source model for each target sample separately when ensembling source model outputs.
arXiv Detail & Related papers (2022-10-23T01:33:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.