Long-Tail Learning with Foundation Model: Heavy Fine-Tuning Hurts
- URL: http://arxiv.org/abs/2309.10019v3
- Date: Sat, 1 Jun 2024 09:59:01 GMT
- Title: Long-Tail Learning with Foundation Model: Heavy Fine-Tuning Hurts
- Authors: Jiang-Xin Shi, Tong Wei, Zhi Zhou, Jie-Jing Shao, Xin-Yan Han, Yu-Feng Li,
- Abstract summary: In this paper, we disclose that heavy fine-tuning may even lead to non-negligible performance deterioration on tail classes.
We develop a low-complexity and accurate long-tail learning algorithms LIFT with the goal of facilitating fast prediction and compact models.
- Score: 42.693469918949006
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The fine-tuning paradigm in addressing long-tail learning tasks has sparked significant interest since the emergence of foundation models. Nonetheless, how fine-tuning impacts performance in long-tail learning was not explicitly quantified. In this paper, we disclose that heavy fine-tuning may even lead to non-negligible performance deterioration on tail classes, and lightweight fine-tuning is more effective. The reason is attributed to inconsistent class conditions caused by heavy fine-tuning. With the observation above, we develop a low-complexity and accurate long-tail learning algorithms LIFT with the goal of facilitating fast prediction and compact models by adaptive lightweight fine-tuning. Experiments clearly verify that both the training time and the learned parameters are significantly reduced with more accurate predictive performance compared with state-of-the-art approaches. The implementation code is available at https://github.com/shijxcs/LIFT.
Related papers
- LIFT+: Lightweight Fine-Tuning for Long-Tail Learning [45.187004699024435]
LIFT+ is an innovative lightweight fine-tuning framework to optimize consistent class conditions.
Our framework provides an efficient and accurate pipeline that facilitates fast convergence and model compactness.
arXiv Detail & Related papers (2025-04-17T18:50:47Z) - Fine-Tuning is Fine, if Calibrated [33.42198023647517]
Fine-tuning a pre-trained model is shown to drastically degrade the model's accuracy in the other classes it had previously learned.
This paper systematically dissects the issue, aiming to answer the fundamental question, "What has been damaged in the fine-tuned model?"
We find that the fine-tuned model neither forgets the relationship among the other classes nor degrades the features to recognize these classes.
arXiv Detail & Related papers (2024-09-24T16:35:16Z) - Sparse is Enough in Fine-tuning Pre-trained Large Language Models [98.46493578509039]
We propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT)
We validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning.
arXiv Detail & Related papers (2023-12-19T06:06:30Z) - An Emulator for Fine-Tuning Large Language Models using Small Language
Models [91.02498576056057]
We introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates the result of pre-training and fine-tuning at different scales.
We show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training.
Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models.
arXiv Detail & Related papers (2023-10-19T17:57:16Z) - Orthogonal Uncertainty Representation of Data Manifold for Robust
Long-Tailed Learning [52.021899899683675]
In scenarios with long-tailed distributions, the model's ability to identify tail classes is limited due to the under-representation of tail samples.
We propose an Orthogonal Uncertainty Representation (OUR) of feature embedding and an end-to-end training strategy to improve the long-tail phenomenon of model robustness.
arXiv Detail & Related papers (2023-10-16T05:50:34Z) - Two-Stage Fine-Tuning: A Novel Strategy for Learning Class-Imbalanced
Data [11.66734752179563]
Classification on long-tailed distributed data is a challenging problem.
Learning on tail classes is especially challenging for the fine-tuning when transferring a pretrained model to a downstream task.
We propose a two-stage fine-tuning: we first fine-tune the final layer of the pretrained model with class-balanced reweighting loss, and then we perform the standard fine-tuning.
arXiv Detail & Related papers (2022-07-22T03:39:51Z) - Towards Inadequately Pre-trained Models in Transfer Learning [37.66278189011681]
Better ImageNet pre-trained models have been demonstrated to have better transferability to downstream tasks.
In this paper, we found that during the same pre-training process, models at middle epochs, which is inadequately pre-trained, can outperform fully trained models.
Our discoveries suggest that, during pre-training, models tend to first learn spectral components corresponding to large singular values.
arXiv Detail & Related papers (2022-03-09T12:15:55Z) - Exploring Strategies for Generalizable Commonsense Reasoning with
Pre-trained Models [62.28551903638434]
We measure the impact of three different adaptation methods on the generalization and accuracy of models.
Experiments with two models show that fine-tuning performs best, by learning both the content and the structure of the task, but suffers from overfitting and limited generalization to novel answers.
We observe that alternative adaptation methods like prefix-tuning have comparable accuracy, but generalize better to unseen answers and are more robust to adversarial splits.
arXiv Detail & Related papers (2021-09-07T03:13:06Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.