Fine-Tuning is Fine, if Calibrated
- URL: http://arxiv.org/abs/2409.16223v3
- Date: Sun, 13 Oct 2024 23:07:33 GMT
- Title: Fine-Tuning is Fine, if Calibrated
- Authors: Zheda Mai, Arpita Chowdhury, Ping Zhang, Cheng-Hao Tu, Hong-You Chen, Vardaan Pahuja, Tanya Berger-Wolf, Song Gao, Charles Stewart, Yu Su, Wei-Lun Chao,
- Abstract summary: Fine-tuning a pre-trained model is shown to drastically degrade the model's accuracy in the other classes it had previously learned.
This paper systematically dissects the issue, aiming to answer the fundamental question, "What has been damaged in the fine-tuned model?"
We find that the fine-tuned model neither forgets the relationship among the other classes nor degrades the features to recognize these classes.
- Score: 33.42198023647517
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fine-tuning is arguably the most straightforward way to tailor a pre-trained model (e.g., a foundation model) to downstream applications, but it also comes with the risk of losing valuable knowledge the model had learned in pre-training. For example, fine-tuning a pre-trained classifier capable of recognizing a large number of classes to master a subset of classes at hand is shown to drastically degrade the model's accuracy in the other classes it had previously learned. As such, it is hard to further use the fine-tuned model when it encounters classes beyond the fine-tuning data. In this paper, we systematically dissect the issue, aiming to answer the fundamental question, "What has been damaged in the fine-tuned model?" To our surprise, we find that the fine-tuned model neither forgets the relationship among the other classes nor degrades the features to recognize these classes. Instead, the fine-tuned model often produces more discriminative features for these other classes, even if they were missing during fine-tuning! {What really hurts the accuracy is the discrepant logit scales between the fine-tuning classes and the other classes}, implying that a simple post-processing calibration would bring back the pre-trained model's capability and at the same time unveil the feature improvement over all classes. We conduct an extensive empirical study to demonstrate the robustness of our findings and provide preliminary explanations underlying them, suggesting new directions for future theoretical analysis. Our code is available at https://github.com/OSU-MLB/Fine-Tuning-Is-Fine-If-Calibrated.
Related papers
- Clarify: Improving Model Robustness With Natural Language Corrections [59.041682704894555]
The standard way to teach models is by feeding them lots of data.
This approach often teaches models incorrect ideas because they pick up on misleading signals in the data.
We propose Clarify, a novel interface and method for interactively correcting model misconceptions.
arXiv Detail & Related papers (2024-02-06T05:11:38Z) - Understanding the Detrimental Class-level Effects of Data Augmentation [63.1733767714073]
achieving optimal average accuracy comes at the cost of significantly hurting individual class accuracy by as much as 20% on ImageNet.
We present a framework for understanding how DA interacts with class-level learning dynamics.
We show that simple class-conditional augmentation strategies improve performance on the negatively affected classes.
arXiv Detail & Related papers (2023-12-07T18:37:43Z) - An Emulator for Fine-Tuning Large Language Models using Small Language
Models [91.02498576056057]
We introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates the result of pre-training and fine-tuning at different scales.
We show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training.
Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models.
arXiv Detail & Related papers (2023-10-19T17:57:16Z) - Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing Label Bias in Foundation Models [75.9543301303586]
Foundation models like CLIP allow zero-shot transfer on various tasks without additional training data.
Fine-tuning and ensembling are also commonly adopted to better fit the downstream tasks.
However, we argue that prior work has overlooked the inherent biases in foundation models.
arXiv Detail & Related papers (2023-10-12T08:01:11Z) - RanPAC: Random Projections and Pre-trained Models for Continual Learning [59.07316955610658]
Continual learning (CL) aims to learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones.
We propose a concise and effective approach for CL with pre-trained models.
arXiv Detail & Related papers (2023-07-05T12:49:02Z) - Are Deep Sequence Classifiers Good at Non-Trivial Generalization? [4.941630596191806]
We study binary sequence classification problems and we look at model calibration from a different perspective.
We focus on sparse sequence classification, that is problems in which the target class is rare and compare three deep learning sequence classification models.
Our results suggest that in this binary setting the deep-learning models are indeed able to learn the underlying class distribution in a non-trivial manner.
arXiv Detail & Related papers (2022-10-24T10:01:06Z) - Attaining Class-level Forgetting in Pretrained Model using Few Samples [18.251805180282346]
In the future, some classes may become restricted due to privacy/ethical concerns.
We propose a novel approach to address this problem without affecting the model's prediction power for the remaining classes.
arXiv Detail & Related papers (2022-10-19T15:36:01Z) - Two-Stage Fine-Tuning: A Novel Strategy for Learning Class-Imbalanced
Data [11.66734752179563]
Classification on long-tailed distributed data is a challenging problem.
Learning on tail classes is especially challenging for the fine-tuning when transferring a pretrained model to a downstream task.
We propose a two-stage fine-tuning: we first fine-tune the final layer of the pretrained model with class-balanced reweighting loss, and then we perform the standard fine-tuning.
arXiv Detail & Related papers (2022-07-22T03:39:51Z) - Exploring Strategies for Generalizable Commonsense Reasoning with
Pre-trained Models [62.28551903638434]
We measure the impact of three different adaptation methods on the generalization and accuracy of models.
Experiments with two models show that fine-tuning performs best, by learning both the content and the structure of the task, but suffers from overfitting and limited generalization to novel answers.
We observe that alternative adaptation methods like prefix-tuning have comparable accuracy, but generalize better to unseen answers and are more robust to adversarial splits.
arXiv Detail & Related papers (2021-09-07T03:13:06Z) - On the Interplay Between Fine-tuning and Sentence-level Probing for
Linguistic Knowledge in Pre-trained Transformers [24.858283637038422]
We study three different pre-trained models: BERT, RoBERTa, and ALBERT.
We find that for some probing tasks fine-tuning leads to substantial changes in accuracy.
While fine-tuning indeed changes the representations of a pre-trained model, only in very few cases, fine-tuning has a positive effect on probing accuracy.
arXiv Detail & Related papers (2020-10-06T10:54:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.