Improving the Reusability of Pre-trained Language Models in Real-world
Applications
- URL: http://arxiv.org/abs/2307.10457v3
- Date: Tue, 8 Aug 2023 04:18:34 GMT
- Title: Improving the Reusability of Pre-trained Language Models in Real-world
Applications
- Authors: Somayeh Ghanbarzadeh, Hamid Palangi, Yan Huang, Radames Cruz Moreno,
and Hamed Khanpour
- Abstract summary: Mask-tuning integrates Masked Language Modeling (MLM) training objectives into the fine-tuning process to enhance PLMs' generalization.
Experiments demonstrate that Mask-tuning surpasses current state-of-the-art techniques.
The findings suggest that Mask-tuning improves the reusability of PLMs on unseen data, making them more practical and effective for real-world applications.
- Score: 9.534831387705312
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The reusability of state-of-the-art Pre-trained Language Models (PLMs) is
often limited by their generalization problem, where their performance
drastically decreases when evaluated on examples that differ from the training
dataset, known as Out-of-Distribution (OOD)/unseen examples. This limitation
arises from PLMs' reliance on spurious correlations, which work well for
frequent example types but not for general examples. To address this issue, we
propose a training approach called Mask-tuning, which integrates Masked
Language Modeling (MLM) training objectives into the fine-tuning process to
enhance PLMs' generalization. Comprehensive experiments demonstrate that
Mask-tuning surpasses current state-of-the-art techniques and enhances PLMs'
generalization on OOD datasets while improving their performance on
in-distribution datasets. The findings suggest that Mask-tuning improves the
reusability of PLMs on unseen data, making them more practical and effective
for real-world applications.
Related papers
- Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning [104.27224674122313]
Fine-tuning MLLM has become a common practice to improve performance on specific downstream tasks.
To balance the trade-off between generalization and specialization, we propose measuring the parameter importance for both pre-trained and fine-tuning distributions.
arXiv Detail & Related papers (2024-11-17T01:16:37Z) - MITA: Bridging the Gap between Model and Data for Test-time Adaptation [68.62509948690698]
Test-Time Adaptation (TTA) has emerged as a promising paradigm for enhancing the generalizability of models.
We propose Meet-In-The-Middle based MITA, which introduces energy-based optimization to encourage mutual adaptation of the model and data from opposing directions.
arXiv Detail & Related papers (2024-10-12T07:02:33Z) - A Gradient Analysis Framework for Rewarding Good and Penalizing Bad Examples in Language Models [63.949883238901414]
We present a unique angle of gradient analysis of loss functions that simultaneously reward good examples and penalize bad ones in LMs.
We find that ExMATE serves as a superior surrogate for MLE, and that combining DPO with ExMATE instead of MLE further enhances both the statistical (5-7%) and generative (+18% win rate) performance.
arXiv Detail & Related papers (2024-08-29T17:46:18Z) - On the Generalization of Preference Learning with DPO [17.420727709895736]
Large language models (LLMs) have demonstrated remarkable capabilities but often struggle to align with human preferences.
Preference learning trains models to distinguish between preferred and non-preferred responses based on human feedback.
This paper introduces a new theoretical framework to analyze the generalization guarantees of models trained with direct preference optimization (DPO)
arXiv Detail & Related papers (2024-08-06T22:11:00Z) - Information Guided Regularization for Fine-tuning Language Models [11.831883526217942]
We argue that a more surgical approach to regularization needs to exist for smoother transfer learning.
We devise a novel approach to dropout for improved model regularization and better downstream generalization.
arXiv Detail & Related papers (2024-06-20T05:18:37Z) - Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs [25.011675414622392]
This study introduces a novel approach to enhance the reward model's generalization ability against distribution shifts.
We retain the base model's language model head and incorporate a suite of text-generation losses to preserve the hidden states' text-generation capabilities.
Our experimental results demonstrate that the introduced regularization technique markedly improves the accuracy of learned reward models.
arXiv Detail & Related papers (2024-06-14T17:49:59Z) - From Robustness to Improved Generalization and Calibration in Pre-trained Language Models [0.0]
We investigate the role of representation smoothness, achieved via Jacobian and Hessian regularization, in enhancing pre-trained language models (PLMs) performance.
We introduce a novel two-phase regularization approach, JacHess, which minimizes the norms of the Jacobian and Hessian matrices within PLM intermediate representations.
Our evaluation using the GLUE benchmark demonstrates that JacHess significantly improves in-domain generalization and calibration in PLMs.
arXiv Detail & Related papers (2024-03-31T18:08:37Z) - Unveiling the Generalization Power of Fine-Tuned Large Language Models [81.70754292058258]
We investigate whether fine-tuning affects the intrinsic generalization ability intrinsic to Large Language Models (LLMs)
Our main findings reveal that models fine-tuned on generation and classification tasks exhibit dissimilar behaviors in generalizing to different domains and tasks.
We observe that integrating the in-context learning strategy during fine-tuning on generation tasks can enhance the model's generalization ability.
arXiv Detail & Related papers (2024-03-14T08:18:59Z) - Can LMs Generalize to Future Data? An Empirical Analysis on Text
Summarization [50.20034493626049]
Recent pre-trained language models (PLMs) achieve promising results in existing abstractive summarization datasets.
Existing summarization benchmarks overlap in time with the standard pre-training corpora and finetuning datasets.
We show that parametric knowledge stored in summarization models significantly affects the faithfulness of the generated summaries on future data.
arXiv Detail & Related papers (2023-05-03T08:08:07Z) - Provable Generalization of Overparameterized Meta-learning Trained with
SGD [62.892930625034374]
We study the generalization of a widely used meta-learning approach, Model-Agnostic Meta-Learning (MAML)
We provide both upper and lower bounds for the excess risk of MAML, which captures how SGD dynamics affect these generalization bounds.
Our theoretical findings are further validated by experiments.
arXiv Detail & Related papers (2022-06-18T07:22:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.