Related papers: Prototypical Fine-tuning: Towards Robust Performance Under Varying Data Sizes

Prototypical Fine-tuning: Towards Robust Performance Under Varying Data Sizes

URL: http://arxiv.org/abs/2211.13638v1
Date: Thu, 24 Nov 2022 14:38:08 GMT
Title: Prototypical Fine-tuning: Towards Robust Performance Under Varying Data Sizes
Authors: Yiqiao Jin, Xiting Wang, Yaru Hao, Yizhou Sun, Xing Xie
Abstract summary: We propose a novel framework for fine-tuning pretrained language models (LM) Our prototypical fine-tuning approach can automatically adjust the model capacity according to the number of data points and the model's inherent attributes.
Score: 47.880781811936345
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we move towards combining large parametric models with non-parametric prototypical networks. We propose prototypical fine-tuning, a novel prototypical framework for fine-tuning pretrained language models (LM), which automatically learns a bias to improve predictive performance for varying data sizes, especially low-resource settings. Our prototypical fine-tuning approach can automatically adjust the model capacity according to the number of data points and the model's inherent attributes. Moreover, we propose four principles for effective prototype fine-tuning towards the optimal solution. Experimental results across various datasets show that our work achieves significant performance improvements under various low-resource settings, as well as comparable and usually better performances in high-resource scenarios.

Related papers

Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models [7.61977883644433]
We propose PRRC to evaluate data quality across Professionalism, Readability, Reasoning, and Cleanliness. We introduce Meta-rater, a multi-dimensional data selection method that integrates these dimensions with existing quality metrics through learned optimal weightings. Experiments demonstrate that Meta-rater doubles convergence speed for 1.3B parameter models and improves downstream task performance by 3.23, with scalable benefits observed in 3.3B models trained on 100B tokens.
arXiv Detail & Related papers (2025-04-19T06:12:33Z)
A Collaborative Ensemble Framework for CTR Prediction [73.59868761656317]
We propose a novel framework, Collaborative Ensemble Training Network (CETNet), to leverage multiple distinct models. Unlike naive model scaling, our approach emphasizes diversity and collaboration through collaborative learning. We validate our framework on three public datasets and a large-scale industrial dataset from Meta.
arXiv Detail & Related papers (2024-11-20T20:38:56Z)
Under the Hood of Tabular Data Generation Models: the Strong Impact of Hyperparameter Tuning [2.5168710814072894]
This study addresses the practical need for a unified evaluation of models. We propose a reduced search space for each model that allows for quick optimization. For most models, large-scale dataset-specific tuning substantially improves performance compared to the original configurations.
arXiv Detail & Related papers (2024-06-18T07:27:38Z)
Feature Protection For Out-of-distribution Generalization [24.072876186625855]
We show that protecting pre-trained features leads to a fine-tuned model more robust to generalization. We show that protecting pre-trained features leads to a fine-tuned model more robust to OOD generalization.
arXiv Detail & Related papers (2024-05-25T03:00:06Z)
When Parameter-efficient Tuning Meets General-purpose Vision-language Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique. Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z)
E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning [55.50908600818483]
Fine-tuning large-scale pretrained vision models for new tasks has become increasingly parameter-intensive. We propose an Effective and Efficient Visual Prompt Tuning (E2VPT) approach for large-scale transformer-based model adaptation. Our approach outperforms several state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2023-07-25T19:03:21Z)
Performance Scaling via Optimal Transport: Enabling Data Selection from Partially Revealed Sources [9.359395812292291]
This paper proposes a framework called or>, which predicts model performance and supports data selection decisions based on partial samples of prospective data sources. or> significantly improves existing performance scaling approaches in terms of both accuracy of performance inference and computation costs associated with constructing the performance. Also, or> outperforms by a wide margin in data selection effectiveness compared to a range of other off-the-shelf solutions.
arXiv Detail & Related papers (2023-07-05T17:33:41Z)
Building Resilience to Out-of-Distribution Visual Data via Input Optimization and Model Finetuning [13.804184845195296]
We propose a preprocessing model that learns to optimise input data for a specific target vision model. We investigate several out-of-distribution scenarios in the context of semantic segmentation for autonomous vehicles. We demonstrate that our approach can enable performance on such data comparable to that of a finetuned model.
arXiv Detail & Related papers (2022-11-29T14:06:35Z)
Investigating Ensemble Methods for Model Robustness Improvement of Text Classifiers [66.36045164286854]
We analyze a set of existing bias features and demonstrate there is no single model that works best for all the cases. By choosing an appropriate bias model, we can obtain a better robustness result than baselines with a more sophisticated model design.
arXiv Detail & Related papers (2022-10-28T17:52:10Z)
Model ensemble instead of prompt fusion: a sample-specific knowledge transfer method for few-shot prompt tuning [85.55727213502402]
We focus on improving the few-shot performance of prompt tuning by transferring knowledge from soft prompts of source tasks. We propose Sample-specific Ensemble of Source Models (SESoM) SESoM learns to adjust the contribution of each source model for each target sample separately when ensembling source model outputs.
arXiv Detail & Related papers (2022-10-23T01:33:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.