Related papers: Improving the Reusability of Pre-trained Language Models in Real-world Applications

Improving the Reusability of Pre-trained Language Models in Real-world Applications

URL: http://arxiv.org/abs/2307.10457v3
Date: Tue, 8 Aug 2023 04:18:34 GMT
Title: Improving the Reusability of Pre-trained Language Models in Real-world Applications
Authors: Somayeh Ghanbarzadeh, Hamid Palangi, Yan Huang, Radames Cruz Moreno, and Hamed Khanpour
Abstract summary: Mask-tuning integrates Masked Language Modeling (MLM) training objectives into the fine-tuning process to enhance PLMs' generalization. Experiments demonstrate that Mask-tuning surpasses current state-of-the-art techniques. The findings suggest that Mask-tuning improves the reusability of PLMs on unseen data, making them more practical and effective for real-world applications.
Score: 9.534831387705312
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The reusability of state-of-the-art Pre-trained Language Models (PLMs) is often limited by their generalization problem, where their performance drastically decreases when evaluated on examples that differ from the training dataset, known as Out-of-Distribution (OOD)/unseen examples. This limitation arises from PLMs' reliance on spurious correlations, which work well for frequent example types but not for general examples. To address this issue, we propose a training approach called Mask-tuning, which integrates Masked Language Modeling (MLM) training objectives into the fine-tuning process to enhance PLMs' generalization. Comprehensive experiments demonstrate that Mask-tuning surpasses current state-of-the-art techniques and enhances PLMs' generalization on OOD datasets while improving their performance on in-distribution datasets. The findings suggest that Mask-tuning improves the reusability of PLMs on unseen data, making them more practical and effective for real-world applications.

Related papers

Estimating the Effects of Sample Training Orders for Large Language Models without Retraining [49.59675538160363]
The order of training samples plays a crucial role in large language models (LLMs)<n>Traditional methods for investigating this effect generally require retraining the model with various sample orders.<n>We improve traditional methods by designing a retraining-free framework.
arXiv Detail & Related papers (2025-05-28T07:07:02Z)
Large Language Models as Attribution Regularizers for Efficient Model Training [0.0]
Large Language Models (LLMs) have demonstrated remarkable performance across diverse domains. We introduce a novel yet straightforward method for incorporating LLM-generated global task feature attributions into the training process of smaller networks. Our approach yields superior performance in few-shot learning scenarios.
arXiv Detail & Related papers (2025-02-27T16:55:18Z)
Towards Better Understanding Table Instruction Tuning: Decoupling the Effects from Data versus Models [62.47618742274461]
We fine-tune base models from the Mistral, OLMo, and Phi families on existing public training datasets. Our replication achieves performance on par with or surpassing existing table LLMs. We decouple the contributions of training data and the base model, providing insight into their individual impacts.
arXiv Detail & Related papers (2025-01-24T18:50:26Z)
Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning [104.27224674122313]
Fine-tuning MLLM has become a common practice to improve performance on specific downstream tasks. To balance the trade-off between generalization and specialization, we propose measuring the parameter importance for both pre-trained and fine-tuning distributions.
arXiv Detail & Related papers (2024-11-17T01:16:37Z)
MITA: Bridging the Gap between Model and Data for Test-time Adaptation [68.62509948690698]
Test-Time Adaptation (TTA) has emerged as a promising paradigm for enhancing the generalizability of models. We propose Meet-In-The-Middle based MITA, which introduces energy-based optimization to encourage mutual adaptation of the model and data from opposing directions.
arXiv Detail & Related papers (2024-10-12T07:02:33Z)
A Gradient Analysis Framework for Rewarding Good and Penalizing Bad Examples in Language Models [63.949883238901414]
We present a unique angle of gradient analysis of loss functions that simultaneously reward good examples and penalize bad ones in LMs. We find that ExMATE serves as a superior surrogate for MLE, and that combining DPO with ExMATE instead of MLE further enhances both the statistical (5-7%) and generative (+18% win rate) performance.
arXiv Detail & Related papers (2024-08-29T17:46:18Z)
On the Generalization of Preference Learning with DPO [17.420727709895736]
Large language models (LLMs) have demonstrated remarkable capabilities but often struggle to align with human preferences. Preference learning trains models to distinguish between preferred and non-preferred responses based on human feedback. This paper introduces a new theoretical framework to analyze the generalization guarantees of models trained with direct preference optimization (DPO)
arXiv Detail & Related papers (2024-08-06T22:11:00Z)
Information Guided Regularization for Fine-tuning Language Models [11.831883526217942]
We argue that a more surgical approach to regularization needs to exist for smoother transfer learning. We devise a novel approach to dropout for improved model regularization and better downstream generalization.
arXiv Detail & Related papers (2024-06-20T05:18:37Z)
Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs [25.011675414622392]
This study introduces a novel approach to enhance the reward model's generalization ability against distribution shifts. We retain the base model's language model head and incorporate a suite of text-generation losses to preserve the hidden states' text-generation capabilities. Our experimental results demonstrate that the introduced regularization technique markedly improves the accuracy of learned reward models.
arXiv Detail & Related papers (2024-06-14T17:49:59Z)
From Robustness to Improved Generalization and Calibration in Pre-trained Language Models [0.0]
We investigate the role of representation smoothness, achieved via Jacobian and Hessian regularization, in enhancing pre-trained language models (PLMs) performance. We introduce a novel two-phase regularization approach, JacHess, which minimizes the norms of the Jacobian and Hessian matrices within PLM intermediate representations. Our evaluation using the GLUE benchmark demonstrates that JacHess significantly improves in-domain generalization and calibration in PLMs.
arXiv Detail & Related papers (2024-03-31T18:08:37Z)
Unveiling the Generalization Power of Fine-Tuned Large Language Models [81.70754292058258]
We investigate whether fine-tuning affects the intrinsic generalization ability intrinsic to Large Language Models (LLMs) Our main findings reveal that models fine-tuned on generation and classification tasks exhibit dissimilar behaviors in generalizing to different domains and tasks. We observe that integrating the in-context learning strategy during fine-tuning on generation tasks can enhance the model's generalization ability.
arXiv Detail & Related papers (2024-03-14T08:18:59Z)
Can LMs Generalize to Future Data? An Empirical Analysis on Text Summarization [50.20034493626049]
Recent pre-trained language models (PLMs) achieve promising results in existing abstractive summarization datasets. Existing summarization benchmarks overlap in time with the standard pre-training corpora and finetuning datasets. We show that parametric knowledge stored in summarization models significantly affects the faithfulness of the generated summaries on future data.
arXiv Detail & Related papers (2023-05-03T08:08:07Z)
Provable Generalization of Overparameterized Meta-learning Trained with SGD [62.892930625034374]
We study the generalization of a widely used meta-learning approach, Model-Agnostic Meta-Learning (MAML) We provide both upper and lower bounds for the excess risk of MAML, which captures how SGD dynamics affect these generalization bounds. Our theoretical findings are further validated by experiments.
arXiv Detail & Related papers (2022-06-18T07:22:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.