Related papers: Evolution without Large Models: Training Language Model with Task Principles

Evolution without Large Models: Training Language Model with Task Principles

URL: http://arxiv.org/abs/2507.05991v1
Date: Tue, 08 Jul 2025 13:52:45 GMT
Title: Evolution without Large Models: Training Language Model with Task Principles
Authors: Minghang Zhu, Shen Gao, Zhengliang Shi, Jiabao Fang, Pengjie Ren, Zhaochun Ren, Zhumin Chen, Shuo Shang,
Abstract summary: A common training approach for language models involves using a large-scale language model to expand a human-provided dataset.<n>This method significantly reduces training costs by eliminating the need for extensive human data annotation.<n>However, it still faces challenges such as high carbon emissions during data augmentation and the risk of data leakage.
Score: 52.44569608690695
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A common training approach for language models involves using a large-scale language model to expand a human-provided dataset, which is subsequently used for model training.This method significantly reduces training costs by eliminating the need for extensive human data annotation. However, it still faces challenges such as high carbon emissions during data augmentation and the risk of data leakage when we use closed-source LLMs. To address these issues, we propose a self-evolution method for language models. First, we introduce the Multi-level Principle Generation, which enables a large-scale model to summarize task-completion principles based on a small amount of task data. Then, we propose the Principle-based Instance Generation, in which a smaller-scale language model uses these task principles to generate a large amount of data. This data is then used for model training. Experimental results show that our proposed method significantly improves model performance compared to directly using a smaller-scale language model to generate data. Additionally, since we only use the large-scale language model to generate the task-completion principles, the carbon emissions associated with training the model are greatly reduced.

Related papers

KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model [27.25688303240741]
KaLM-Embedding is a general multilingual embedding model that leverages a large quantity of cleaner, more diverse, and domain-specific training data.<n>Our model has been trained with key techniques proven to enhance performance.
arXiv Detail & Related papers (2025-01-02T03:17:51Z)
Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn, Focus, and Review [50.78587571704713]
Learn-Focus-Review (LFR) is a dynamic training approach that adapts to the model's learning progress.<n>LFR tracks the model's learning performance across data blocks (sequences of tokens) and prioritizes revisiting challenging regions of the dataset.<n>Compared to baseline models trained on the full datasets, LFR consistently achieved lower perplexity and higher accuracy.
arXiv Detail & Related papers (2024-09-10T00:59:18Z)
InkubaLM: A small language model for low-resource African languages [9.426968756845389]
InkubaLM is a small language model with 0.4 billion parameters. It achieves performance comparable to models with significantly larger parameter counts. It demonstrates remarkable consistency across multiple languages.
arXiv Detail & Related papers (2024-08-30T05:42:31Z)
Unlocking the Potential of Model Merging for Low-Resource Languages [66.7716891808697]
Adapting large language models to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT) We propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training. Experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data.
arXiv Detail & Related papers (2024-07-04T15:14:17Z)
HARE: HumAn pRiors, a key to small language model Efficiency [6.253561984966316]
Human priors play a crucial role in efficiently utilizing data in deep learning. Existing Small Language Models mainly rely on web-scraped large-scale training data. We propose a principle to leverage human priors for data construction.
arXiv Detail & Related papers (2024-06-17T10:56:03Z)
Emergent Abilities in Reduced-Scale Generative Language Models [10.51168925267033]
Large language models can solve new tasks without task-specific fine-tuning. This ability is considered an emergent ability and is primarily seen in large language models with billions of parameters. This study investigates if such emergent properties are strictly tied to model size or can be demonstrated by smaller models trained on reduced-scale data.
arXiv Detail & Related papers (2024-04-02T18:00:28Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [47.432215933099016]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.<n>This creates a barrier to fusing knowledge across individual models to yield a better single model.<n>We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals [151.3601429216877]
We present an efficient method of pretraining large-scale autoencoding language models using training signals generated by an auxiliary model. We propose a recipe, namely "Model generated dEnoising TRaining Objective" (METRO) The resultant models, METRO-LM, consisting of up to 5.4 billion parameters, achieve new state-of-the-art on the GLUE, SuperGLUE, and SQuAD benchmarks.
arXiv Detail & Related papers (2022-04-13T21:39:15Z)
bert2BERT: Towards Reusable Pretrained Language Models [51.078081486422896]
We propose bert2BERT, which can effectively transfer the knowledge of an existing smaller pre-trained model to a large model. bert2BERT saves about 45% and 47% computational cost of pre-training BERT_BASE and GPT_BASE by reusing the models of almost their half sizes.
arXiv Detail & Related papers (2021-10-14T04:05:25Z)
Exploring Versatile Generative Language Model Via Parameter-Efficient Transfer Learning [70.81910984985683]
We propose an effective way to fine-tune multiple down-stream generation tasks simultaneously using a single, large pre-trained model. The experiments on five diverse language generation tasks show that by just using an additional 2-3% parameters for each task, our model can maintain or even improve the performance of fine-tuning the whole model.
arXiv Detail & Related papers (2020-04-08T06:18:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.