Model Balancing Helps Low-data Training and Fine-tuning
- URL: http://arxiv.org/abs/2410.12178v1
- Date: Wed, 16 Oct 2024 02:48:39 GMT
- Title: Model Balancing Helps Low-data Training and Fine-tuning
- Authors: Zihang Liu, Yuanzhe Hu, Tianyu Pang, Yefan Zhou, Pu Ren, Yaoqing Yang,
- Abstract summary: Recent advances in foundation models have emphasized the need to align pre-trained models with specialized domains.
These topics have also gained increasing attention in the emerging field of scientific machine learning (SciML)
To address the limitations of low-data training and fine-tuning, we draw inspiration from Heavy-Tailed Self-Regularization (HT-SR) theory.
We adapt a recently proposed layer-wise learning rate scheduler, TempBalance, which effectively balances training quality across layers.
- Score: 19.63134953504884
- License:
- Abstract: Recent advances in foundation models have emphasized the need to align pre-trained models with specialized domains using small, curated datasets. Studies on these foundation models underscore the importance of low-data training and fine-tuning. This topic, well-known in natural language processing (NLP), has also gained increasing attention in the emerging field of scientific machine learning (SciML). To address the limitations of low-data training and fine-tuning, we draw inspiration from Heavy-Tailed Self-Regularization (HT-SR) theory, analyzing the shape of empirical spectral densities (ESDs) and revealing an imbalance in training quality across different model layers. To mitigate this issue, we adapt a recently proposed layer-wise learning rate scheduler, TempBalance, which effectively balances training quality across layers and enhances low-data training and fine-tuning for both NLP and SciML tasks. Notably, TempBalance demonstrates increasing performance gains as the amount of available tuning data decreases. Comparative analyses further highlight the effectiveness of TempBalance and its adaptability as an "add-on" method for improving model performance.
Related papers
- Low-rank finetuning for LLMs: A fairness perspective [54.13240282850982]
Low-rank approximation techniques have become the de facto standard for fine-tuning Large Language Models.
This paper investigates the effectiveness of these methods in capturing the shift of fine-tuning datasets from the initial pre-trained data distribution.
We show that low-rank fine-tuning inadvertently preserves undesirable biases and toxic behaviors.
arXiv Detail & Related papers (2024-05-28T20:43:53Z) - GPTA: Generative Prompt Tuning Assistant for Synergistic Downstream Neural Network Enhancement with LLMs [11.572835837392867]
This study introduces GPTA, a Large Language Model assistance training framework, that enhances the training of downstream task models via prefix prompt.
By minimizing data exposure to LLM, the framework addresses the security and legal challenges of applying LLM in downstream task model training.
arXiv Detail & Related papers (2024-03-29T23:04:04Z) - Bias Mitigation in Fine-tuning Pre-trained Models for Enhanced Fairness
and Efficiency [26.86557244460215]
We introduce an efficient and robust fine-tuning framework specifically designed to mitigate biases in new tasks.
Our empirical analysis shows that the parameters in the pre-trained model that affect predictions for different demographic groups are different.
We employ a transfer learning strategy that neutralizes the importance of these influential weights, determined using Fisher information across demographic groups.
arXiv Detail & Related papers (2024-03-01T16:01:28Z) - Order Matters in the Presence of Dataset Imbalance for Multilingual
Learning [53.74649778447903]
We present a simple yet effective method of pre-training on high-resource tasks, followed by fine-tuning on a mixture of high/low-resource tasks.
We show its improvements in neural machine translation (NMT) and multi-lingual language modeling.
arXiv Detail & Related papers (2023-12-11T05:46:57Z) - An Emulator for Fine-Tuning Large Language Models using Small Language
Models [91.02498576056057]
We introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates the result of pre-training and fine-tuning at different scales.
We show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training.
Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models.
arXiv Detail & Related papers (2023-10-19T17:57:16Z) - CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain
Performance and Calibration [59.48235003469116]
We show that data augmentation consistently enhances OOD performance.
We also show that CF augmented models which are easier to calibrate also exhibit much lower entropy when assigning importance.
arXiv Detail & Related papers (2023-09-14T16:16:40Z) - Alleviating the Effect of Data Imbalance on Adversarial Training [26.36714114672729]
We study adversarial training on datasets that obey the long-tailed distribution.
We propose a new adversarial training framework -- Re-balancing Adversarial Training (REAT)
arXiv Detail & Related papers (2023-07-14T07:01:48Z) - Rethinking Soft Label in Label Distribution Learning Perspective [0.27719338074999533]
The primary goal of training in early convolutional neural networks (CNN) is the higher generalization performance of the model.
We investigated that performing label distribution learning (LDL) would enhance the model calibration in CNN training.
We performed several visualizations and analyses and witnessed several interesting behaviors in CNN training with the LDL.
arXiv Detail & Related papers (2023-01-31T06:47:19Z) - FairIF: Boosting Fairness in Deep Learning via Influence Functions with
Validation Set Sensitive Attributes [51.02407217197623]
We propose a two-stage training algorithm named FAIRIF.
It minimizes the loss over the reweighted data set where the sample weights are computed.
We show that FAIRIF yields models with better fairness-utility trade-offs against various types of bias.
arXiv Detail & Related papers (2022-01-15T05:14:48Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.