Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization
- URL: http://arxiv.org/abs/2402.14270v2
- Date: Fri, 1 Mar 2024 15:21:16 GMT
- Title: Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization
- Authors: Xuxi Chen, Zhendong Wang, Daouda Sow, Junjie Yang, Tianlong Chen,
Yingbin Liang, Mingyuan Zhou, Zhangyang Wang
- Abstract summary: A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
- Score: 165.98557106089777
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the rapidly advancing arena of large language models (LLMs), a key
challenge is to enhance their capabilities amid a looming shortage of
high-quality training data. Our study starts from an empirical strategy for the
light continual training of LLMs using their original pre-training data sets,
with a specific focus on selective retention of samples that incur moderately
high losses. These samples are deemed informative and beneficial for model
refinement, contrasting with the highest-loss samples, which would be discarded
due to their correlation with data noise and complexity. We then formalize this
strategy into a principled framework of Instance-Reweighted Distributionally
Robust Optimization (IR-DRO). IR-DRO is designed to dynamically prioritize the
training focus on informative samples through an instance reweighting
mechanism, streamlined by a closed-form solution for straightforward
integration into established training protocols. Through rigorous
experimentation with various models and datasets, our findings indicate that
our sample-targeted methods significantly improve LLM performance across
multiple benchmarks, in both continual pre-training and instruction tuning
scenarios. Our codes are available at
https://github.com/VITA-Group/HardFocusTraining.
Related papers
- Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate [118.37653302885607]
We present the Modality Integration Rate (MIR), an effective, robust, and generalized metric to indicate the multi-modal pre-training quality of Large Vision Language Models (LVLMs)
MIR is indicative about training data selection, training strategy schedule, and model architecture design to get better pre-training results.
arXiv Detail & Related papers (2024-10-09T17:59:04Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - How to Train Data-Efficient LLMs [56.41105687693619]
We study data-efficient approaches for pre-training language models (LLMs)
We find that Ask-LLM and Density sampling are the best methods in their respective categories.
In our comparison of 19 samplers, involving hundreds of evaluation tasks and pre-training runs, we find that Ask-LLM and Density are the best methods in their respective categories.
arXiv Detail & Related papers (2024-02-15T02:27:57Z) - Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in
Dense Encoders [63.28408887247742]
We study whether training procedures can be improved to yield better generalization capabilities in the resulting models.
We recommend a simple recipe for training dense encoders: Train on MSMARCO with parameter-efficient methods, such as LoRA, and opt for using in-batch negatives unless given well-constructed hard negatives.
arXiv Detail & Related papers (2023-11-16T10:42:58Z) - Sampling Through the Lens of Sequential Decision Making [9.101505546901999]
We propose a reward-guided sampling strategy called Adaptive Sample with Reward (ASR)
Our approach optimally adjusts the sampling process to achieve optimal performance.
Empirical results in information retrieval and clustering demonstrate ASR's superb performance across different datasets.
arXiv Detail & Related papers (2022-08-17T04:01:29Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.