Improving the Reusability of Pre-trained Language Models in Real-world
Applications
- URL: http://arxiv.org/abs/2307.10457v3
- Date: Tue, 8 Aug 2023 04:18:34 GMT
- Title: Improving the Reusability of Pre-trained Language Models in Real-world
Applications
- Authors: Somayeh Ghanbarzadeh, Hamid Palangi, Yan Huang, Radames Cruz Moreno,
and Hamed Khanpour
- Abstract summary: Mask-tuning integrates Masked Language Modeling (MLM) training objectives into the fine-tuning process to enhance PLMs' generalization.
Experiments demonstrate that Mask-tuning surpasses current state-of-the-art techniques.
The findings suggest that Mask-tuning improves the reusability of PLMs on unseen data, making them more practical and effective for real-world applications.
- Score: 9.534831387705312
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The reusability of state-of-the-art Pre-trained Language Models (PLMs) is
often limited by their generalization problem, where their performance
drastically decreases when evaluated on examples that differ from the training
dataset, known as Out-of-Distribution (OOD)/unseen examples. This limitation
arises from PLMs' reliance on spurious correlations, which work well for
frequent example types but not for general examples. To address this issue, we
propose a training approach called Mask-tuning, which integrates Masked
Language Modeling (MLM) training objectives into the fine-tuning process to
enhance PLMs' generalization. Comprehensive experiments demonstrate that
Mask-tuning surpasses current state-of-the-art techniques and enhances PLMs'
generalization on OOD datasets while improving their performance on
in-distribution datasets. The findings suggest that Mask-tuning improves the
reusability of PLMs on unseen data, making them more practical and effective
for real-world applications.
Related papers
- Information Guided Regularization for Fine-tuning Language Models [11.831883526217942]
We argue that a more surgical approach to regularization needs to exist for smoother transfer learning.
We devise a novel approach to dropout for improved model regularization and better downstream generalization.
arXiv Detail & Related papers (2024-06-20T05:18:37Z) - Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs [25.011675414622392]
Reward models trained on human preference data have been proven to be effective for aligning Large Language Models with human intent.
However, the generalization capabilities of current reward models to unseen prompts and responses are limited.
Our study proposes a novel approach to enhance the reward model's generalization ability against distribution shifts by regularizing the hidden states.
arXiv Detail & Related papers (2024-06-14T17:49:59Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - From Robustness to Improved Generalization and Calibration in Pre-trained Language Models [0.0]
We investigate the role of representation smoothness, achieved via Jacobian and Hessian regularization, in enhancing pre-trained language models (PLMs) performance.
We introduce a novel two-phase regularization approach, JacHess, which minimizes the norms of the Jacobian and Hessian matrices within PLM intermediate representations.
Our evaluation using the GLUE benchmark demonstrates that JacHess significantly improves in-domain generalization and calibration in PLMs.
arXiv Detail & Related papers (2024-03-31T18:08:37Z) - Unveiling the Generalization Power of Fine-Tuned Large Language Models [81.70754292058258]
We investigate whether fine-tuning affects the intrinsic generalization ability intrinsic to Large Language Models (LLMs)
Our main findings reveal that models fine-tuned on generation and classification tasks exhibit dissimilar behaviors in generalizing to different domains and tasks.
We observe that integrating the in-context learning strategy during fine-tuning on generation tasks can enhance the model's generalization ability.
arXiv Detail & Related papers (2024-03-14T08:18:59Z) - FairSISA: Ensemble Post-Processing to Improve Fairness of Unlearning in
LLMs [6.689848416609951]
We study the interplay between unlearning and fairness for large language models (LLMs)
We focus on a popular unlearning framework known as SISA, which creates an ensemble of models trained on disjoint shards.
We propose post-processing bias mitigation techniques for ensemble models produced by SISA.
arXiv Detail & Related papers (2023-12-12T16:44:47Z) - Consistency Regularization for Generalizable Source-free Domain
Adaptation [62.654883736925456]
Source-free domain adaptation (SFDA) aims to adapt a well-trained source model to an unlabelled target domain without accessing the source dataset.
Existing SFDA methods ONLY assess their adapted models on the target training set, neglecting the data from unseen but identically distributed testing sets.
We propose a consistency regularization framework to develop a more generalizable SFDA method.
arXiv Detail & Related papers (2023-08-03T07:45:53Z) - Can LMs Generalize to Future Data? An Empirical Analysis on Text
Summarization [50.20034493626049]
Recent pre-trained language models (PLMs) achieve promising results in existing abstractive summarization datasets.
Existing summarization benchmarks overlap in time with the standard pre-training corpora and finetuning datasets.
We show that parametric knowledge stored in summarization models significantly affects the faithfulness of the generated summaries on future data.
arXiv Detail & Related papers (2023-05-03T08:08:07Z) - SimSCOOD: Systematic Analysis of Out-of-Distribution Generalization in
Fine-tuned Source Code Models [58.78043959556283]
We study the behaviors of models under different fine-tuning methodologies, including full fine-tuning and Low-Rank Adaptation (LoRA) fine-tuning methods.
Our analysis uncovers that LoRA fine-tuning consistently exhibits significantly better OOD generalization performance than full fine-tuning across various scenarios.
arXiv Detail & Related papers (2022-10-10T16:07:24Z) - Provable Generalization of Overparameterized Meta-learning Trained with
SGD [62.892930625034374]
We study the generalization of a widely used meta-learning approach, Model-Agnostic Meta-Learning (MAML)
We provide both upper and lower bounds for the excess risk of MAML, which captures how SGD dynamics affect these generalization bounds.
Our theoretical findings are further validated by experiments.
arXiv Detail & Related papers (2022-06-18T07:22:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.