Semi-supervised Fine-tuning for Large Language Models
- URL: http://arxiv.org/abs/2410.14745v2
- Date: Wed, 19 Feb 2025 15:32:29 GMT
- Title: Semi-supervised Fine-tuning for Large Language Models
- Authors: Junyu Luo, Xiao Luo, Xiusi Chen, Zhiping Xiao, Wei Ju, Ming Zhang,
- Abstract summary: We introduce a semi-supervised fine-tuning(SemiFT) task and a framework named SemiEvol for LLM alignment.
For knowledge propagation, SemiEvol adopts a bi-level approach, propagating knowledge from labeled data to unlabeled data.
For knowledge selection, SemiEvol incorporates a collaborative learning mechanism, selecting higher-quality pseudo-response samples.
- Score: 14.782756931646627
- License:
- Abstract: Supervised fine-tuning (SFT) is crucial in adapting large language model (LLMs) to a specific domain or task. However, only a limited amount of labeled data is available in practical applications, which poses a severe challenge for SFT in yielding satisfactory results. Therefore, a data-efficient framework that can fully exploit labeled and unlabeled data for LLM fine-tuning is highly anticipated.Towards this end, we introduce a semi-supervised fine-tuning(SemiFT) task and a framework named SemiEvol for LLM alignment from a propagate-and-select manner. For knowledge propagation, SemiEvol adopts a bi-level approach, propagating knowledge from labeled data to unlabeled data through both in-weight and in-context methods. For knowledge selection, SemiEvol incorporates a collaborative learning mechanism, selecting higher-quality pseudo-response samples. We conducted experiments using GPT-4o-mini and Llama-3.1 on seven general or domain-specific datasets, demonstrating significant improvements in model performance on target data. Furthermore, we compared SemiEvol with SFT and self-evolution methods, highlighting its practicality in hybrid data scenarios.
Related papers
- SampleLLM: Optimizing Tabular Data Synthesis in Recommendations [46.689486044254544]
Tabular data synthesis is crucial in machine learning, yet existing general methods are highly data-dependent and often fall short in recommender systems.
This limitation arises from their difficulty in capturing complex distributions and understanding feature relationships from sparse and limited data.
We propose a novel two-stage framework named SampleLLM to improve the quality of LLM-based data synthesis for recommendation tasks.
arXiv Detail & Related papers (2025-01-27T15:12:27Z) - Preference-Oriented Supervised Fine-Tuning: Favoring Target Model Over Aligned Large Language Models [12.500777267361102]
We introduce a novel textbfpreference-textbforiented supervised textbffine-textbftuning approach, namely PoFT.
The intuition is to boost SFT by imposing a particular preference: textitfavoring the target model over aligned LLMs on the same SFT data.
PoFT achieves stable and consistent improvements over the SFT baselines across different training datasets and base models.
arXiv Detail & Related papers (2024-12-17T12:49:14Z) - Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning [71.2981957820888]
We propose a novel Star-Agents framework, which automates the enhancement of data quality across datasets.
The framework initially generates diverse instruction data with multiple LLM agents through a bespoke sampling method.
The generated data undergo a rigorous evaluation using a dual-model method that assesses both difficulty and quality.
arXiv Detail & Related papers (2024-11-21T02:30:53Z) - Not All LLM-Generated Data Are Equal: Rethinking Data Weighting in Text Classification [7.357494019212501]
We propose efficient weighted-loss approaches to align synthetic data with real-world distribution.
We empirically assessed the effectiveness of our method on multiple text classification tasks.
arXiv Detail & Related papers (2024-10-28T20:53:49Z) - Entropy Law: The Story Behind Data Compression and LLM Performance [115.70395740286422]
We find that model performance is negatively correlated to the compression ratio of training data, which usually yields a lower training loss.
Based on the findings of the entropy law, we propose a quite efficient and universal data selection method.
We also present an interesting application of entropy law that can detect potential performance risks at the beginning of model training.
arXiv Detail & Related papers (2024-07-09T08:14:29Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - Comparative Analysis of Different Efficient Fine Tuning Methods of Large Language Models (LLMs) in Low-Resource Setting [0.0]
We try to push the understanding of different fine-tuning strategies for large language models (LLMs)
We compare state-of-the-art methods like vanilla fine-tuning and Pattern-Based Fine-Tuning (PBFT) on pre-trained models across two datasets, COLA and MNLI.
Our findings suggest that these alternative strategies can exhibit out-of-domain generalization comparable to that of vanilla FT and PBFT.
arXiv Detail & Related papers (2024-05-21T20:08:52Z) - SHED: Shapley-Based Automated Dataset Refinement for Instruction Fine-Tuning [16.307467144690683]
Large Language Models can achieve desirable performance with only a small amount of high-quality data.
Identifying high-quality data from vast datasets to curate small yet effective datasets has emerged as a critical challenge.
We introduce SHED, an automated dataset refinement framework based on Shapley value for instruction fine-tuning.
arXiv Detail & Related papers (2024-04-23T04:56:48Z) - Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models [52.98743860365194]
We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN)
At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself.
This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents.
arXiv Detail & Related papers (2024-01-02T18:53:13Z) - Open-Set Semi-Supervised Learning for 3D Point Cloud Understanding [62.17020485045456]
It is commonly assumed in semi-supervised learning (SSL) that the unlabeled data are drawn from the same distribution as that of the labeled ones.
We propose to selectively utilize unlabeled data through sample weighting, so that only conducive unlabeled data would be prioritized.
arXiv Detail & Related papers (2022-05-02T16:09:17Z) - Revisiting LSTM Networks for Semi-Supervised Text Classification via
Mixed Objective Function [106.69643619725652]
We develop a training strategy that allows even a simple BiLSTM model, when trained with cross-entropy loss, to achieve competitive results.
We report state-of-the-art results for text classification task on several benchmark datasets.
arXiv Detail & Related papers (2020-09-08T21:55:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.