Knowledge Inheritance for Pre-trained Language Models
- URL: http://arxiv.org/abs/2105.13880v1
- Date: Fri, 28 May 2021 14:43:26 GMT
- Title: Knowledge Inheritance for Pre-trained Language Models
- Authors: Yujia Qin, Yankai Lin, Jing Yi, Jiajie Zhang, Xu Han, Zhengyan Zhang,
Yusheng Su, Zhiyuan Liu, Peng Li, Maosong Sun, Jie Zhou
- Abstract summary: We introduce a novel pre-training framework named "knowledge inheritance" (KI)
KI combines both self-learning and teacher-guided learning to efficiently train larger PLMs.
We show that KI can well support lifelong learning and knowledge transfer.
- Score: 57.51305807391381
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent explorations of large-scale pre-trained language models (PLMs) such as
GPT-3 have revealed the power of PLMs with huge amounts of parameters, setting
off a wave of training ever-larger PLMs. However, training a large-scale PLM
requires tremendous amounts of computational resources, which is time-consuming
and expensive. In addition, existing large-scale PLMs are mainly trained from
scratch individually, ignoring the availability of many existing well-trained
PLMs. To this end, we explore the question that how can previously trained PLMs
benefit training larger PLMs in future. Specifically, we introduce a novel
pre-training framework named "knowledge inheritance" (KI), which combines both
self-learning and teacher-guided learning to efficiently train larger PLMs.
Sufficient experimental results demonstrate the feasibility of our KI
framework. We also conduct empirical analyses to explore the effects of teacher
PLMs' pre-training settings, including model architecture, pre-training data,
etc. Finally, we show that KI can well support lifelong learning and knowledge
transfer.
Related papers
- A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs [74.35290684163718]
A primary challenge in large language model (LLM) development is their onerous pre-training cost.
This paper explores a promising paradigm to improve LLM pre-training efficiency and quality by leveraging a small language model (SLM)
arXiv Detail & Related papers (2024-10-24T14:31:52Z) - MiniPLM: Knowledge Distillation for Pre-Training Language Models [109.83741809808483]
MiniPLM is a KD framework for pre-training student language models.
For efficiency, MiniPLM performs offline teacher LM inference, allowing KD for multiple student LMs without adding training-time costs.
For flexibility, MiniPLM operates solely on the training corpus, enabling KD across model families.
arXiv Detail & Related papers (2024-10-22T17:40:32Z) - How Do Large Language Models Acquire Factual Knowledge During Pretraining? [36.59608982935844]
We study how large language models (LLMs) acquire factual knowledge during pretraining.
Findings reveal several important insights into the dynamics of factual knowledge acquisition during pretraining.
arXiv Detail & Related papers (2024-06-17T17:54:40Z) - The Future of Large Language Model Pre-training is Federated [15.237418036900582]
We propose a scalable deployment system called Photon to enable the investigation and development of this new training paradigm for LLM pre-training.
We show that Photon can be used by organizations interested in collaborating with their private data sources and computational resources for pre-training LLMs with billions of parameters.
We further show the effectiveness of the federated training scales with model size and present our approach for training billion-scale federated LLMs using limited resources.
arXiv Detail & Related papers (2024-05-17T15:27:52Z) - Continual Learning for Large Language Models: A Survey [95.79977915131145]
Large language models (LLMs) are not amenable to frequent re-training, due to high training costs arising from their massive scale.
This paper surveys recent works on continual learning for LLMs.
arXiv Detail & Related papers (2024-02-02T12:34:09Z) - Knowledge Editing for Large Language Models: A Survey [51.01368551235289]
One major drawback of large language models (LLMs) is their substantial computational cost for pre-training.
Knowledge-based Model Editing (KME) has attracted increasing attention, which aims to precisely modify the LLMs to incorporate specific knowledge.
arXiv Detail & Related papers (2023-10-24T22:18:13Z) - Rethinking Learning Rate Tuning in the Era of Large Language Models [11.87985768634266]
Large Language Models (LLMs) represent the recent success of deep learning in achieving remarkable human-like predictive performance.
It has become a mainstream strategy to leverage fine-tuning to adapt LLMs for various real-world applications.
Existing learning rate policies are primarily designed for training traditional deep neural networks (DNNs)
arXiv Detail & Related papers (2023-09-16T03:37:00Z) - ElitePLM: An Empirical Study on General Language Ability Evaluation of
Pretrained Language Models [78.08792285698853]
We present a large-scale empirical study on general language ability evaluation of pretrained language models (ElitePLM)
Our empirical results demonstrate that: (1) PLMs with varying training objectives and strategies are good at different ability tests; (2) fine-tuning PLMs in downstream tasks is usually sensitive to the data size and distribution; and (3) PLMs have excellent transferability between similar tasks.
arXiv Detail & Related papers (2022-05-03T14:18:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.