Knowledge Inheritance for Pre-trained Language Models
- URL: http://arxiv.org/abs/2105.13880v1
- Date: Fri, 28 May 2021 14:43:26 GMT
- Title: Knowledge Inheritance for Pre-trained Language Models
- Authors: Yujia Qin, Yankai Lin, Jing Yi, Jiajie Zhang, Xu Han, Zhengyan Zhang,
Yusheng Su, Zhiyuan Liu, Peng Li, Maosong Sun, Jie Zhou
- Abstract summary: We introduce a novel pre-training framework named "knowledge inheritance" (KI)
KI combines both self-learning and teacher-guided learning to efficiently train larger PLMs.
We show that KI can well support lifelong learning and knowledge transfer.
- Score: 57.51305807391381
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent explorations of large-scale pre-trained language models (PLMs) such as
GPT-3 have revealed the power of PLMs with huge amounts of parameters, setting
off a wave of training ever-larger PLMs. However, training a large-scale PLM
requires tremendous amounts of computational resources, which is time-consuming
and expensive. In addition, existing large-scale PLMs are mainly trained from
scratch individually, ignoring the availability of many existing well-trained
PLMs. To this end, we explore the question that how can previously trained PLMs
benefit training larger PLMs in future. Specifically, we introduce a novel
pre-training framework named "knowledge inheritance" (KI), which combines both
self-learning and teacher-guided learning to efficiently train larger PLMs.
Sufficient experimental results demonstrate the feasibility of our KI
framework. We also conduct empirical analyses to explore the effects of teacher
PLMs' pre-training settings, including model architecture, pre-training data,
etc. Finally, we show that KI can well support lifelong learning and knowledge
transfer.
Related papers
- Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs [22.177654792824896]
We focus on small-sized language models (3B to 7B parameters) for their cost-efficiency and accessibility.
We explore various training configurations and strategies across four open-source pre-trained models.
Key insights from our work include: (i) larger batch sizes paired with lower learning rates lead to improved model performance on benchmarks such as MMLU, MTBench, and Open LLM Leaderboard; (ii) early-stage training dynamics, such as lower gradient norms and higher loss values, are strong indicators of better final model performance; (iv) we observed no significant difference in performance between phased and stacked training strategies, but
arXiv Detail & Related papers (2024-12-17T21:16:59Z) - A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs [74.35290684163718]
A primary challenge in large language model (LLM) development is their onerous pre-training cost.
This paper explores a promising paradigm to improve LLM pre-training efficiency and quality by leveraging a small language model (SLM)
arXiv Detail & Related papers (2024-10-24T14:31:52Z) - MiniPLM: Knowledge Distillation for Pre-Training Language Models [109.83741809808483]
MiniPLM is a KD framework for pre-training student language models.
For efficiency, MiniPLM performs offline teacher LM inference, allowing KD for multiple student LMs without adding training-time costs.
For flexibility, MiniPLM operates solely on the training corpus, enabling KD across model families.
arXiv Detail & Related papers (2024-10-22T17:40:32Z) - LUK: Empowering Log Understanding with Expert Knowledge from Large Language Models [32.65636568742875]
Small language models (PLMs) and large language models (LLMs) have become the current mainstream approaches for log analysis.
This paper introduces a novel knowledge enhancement framework, called LUK, which acquires expert knowledge from LLMs automatically and then enhances the smaller PLM for log analysis with these expert knowledge.
LUK achieves state-of-the-art results on different log analysis tasks and extensive experiments demonstrate expert knowledge from LLMs can be utilized more effectively to understand logs.
arXiv Detail & Related papers (2024-09-03T13:58:34Z) - The Future of Large Language Model Pre-training is Federated [15.237418036900582]
We propose a scalable deployment system called Photon to enable the investigation and development of this new training paradigm for LLM pre-training.
We show that Photon can be used by organizations interested in collaborating with their private data sources and computational resources for pre-training LLMs with billions of parameters.
We further show the effectiveness of the federated training scales with model size and present our approach for training billion-scale federated LLMs using limited resources.
arXiv Detail & Related papers (2024-05-17T15:27:52Z) - Continual Learning for Large Language Models: A Survey [95.79977915131145]
Large language models (LLMs) are not amenable to frequent re-training, due to high training costs arising from their massive scale.
This paper surveys recent works on continual learning for LLMs.
arXiv Detail & Related papers (2024-02-02T12:34:09Z) - Knowledge Editing for Large Language Models: A Survey [51.01368551235289]
One major drawback of large language models (LLMs) is their substantial computational cost for pre-training.
Knowledge-based Model Editing (KME) has attracted increasing attention, which aims to precisely modify the LLMs to incorporate specific knowledge.
arXiv Detail & Related papers (2023-10-24T22:18:13Z) - ElitePLM: An Empirical Study on General Language Ability Evaluation of
Pretrained Language Models [78.08792285698853]
We present a large-scale empirical study on general language ability evaluation of pretrained language models (ElitePLM)
Our empirical results demonstrate that: (1) PLMs with varying training objectives and strategies are good at different ability tests; (2) fine-tuning PLMs in downstream tasks is usually sensitive to the data size and distribution; and (3) PLMs have excellent transferability between similar tasks.
arXiv Detail & Related papers (2022-05-03T14:18:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.