Knowledge Inheritance for Pre-trained Language Models
- URL: http://arxiv.org/abs/2105.13880v1
- Date: Fri, 28 May 2021 14:43:26 GMT
- Title: Knowledge Inheritance for Pre-trained Language Models
- Authors: Yujia Qin, Yankai Lin, Jing Yi, Jiajie Zhang, Xu Han, Zhengyan Zhang,
Yusheng Su, Zhiyuan Liu, Peng Li, Maosong Sun, Jie Zhou
- Abstract summary: We introduce a novel pre-training framework named "knowledge inheritance" (KI)
KI combines both self-learning and teacher-guided learning to efficiently train larger PLMs.
We show that KI can well support lifelong learning and knowledge transfer.
- Score: 57.51305807391381
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent explorations of large-scale pre-trained language models (PLMs) such as
GPT-3 have revealed the power of PLMs with huge amounts of parameters, setting
off a wave of training ever-larger PLMs. However, training a large-scale PLM
requires tremendous amounts of computational resources, which is time-consuming
and expensive. In addition, existing large-scale PLMs are mainly trained from
scratch individually, ignoring the availability of many existing well-trained
PLMs. To this end, we explore the question that how can previously trained PLMs
benefit training larger PLMs in future. Specifically, we introduce a novel
pre-training framework named "knowledge inheritance" (KI), which combines both
self-learning and teacher-guided learning to efficiently train larger PLMs.
Sufficient experimental results demonstrate the feasibility of our KI
framework. We also conduct empirical analyses to explore the effects of teacher
PLMs' pre-training settings, including model architecture, pre-training data,
etc. Finally, we show that KI can well support lifelong learning and knowledge
transfer.
Related papers
- How Do Large Language Models Acquire Factual Knowledge During Pretraining? [36.59608982935844]
We study how large language models (LLMs) acquire factual knowledge during pretraining.
Findings reveal several important insights into the dynamics of factual knowledge acquisition during pretraining.
arXiv Detail & Related papers (2024-06-17T17:54:40Z) - The Future of Large Language Model Pre-training is Federated [15.237418036900582]
Federated learning has the potential to unleash the majority of the planet's data and computational resources.
We propose a scalable deployment system called Photon to enable the investigation and development of this new training paradigm.
We show that Photon can be used by organizations interested in collaborating with their private data sources and computational resources.
arXiv Detail & Related papers (2024-05-17T15:27:52Z) - Learning to Generate Explainable Stock Predictions using Self-Reflective
Large Language Models [54.21695754082441]
We propose a framework to teach Large Language Models (LLMs) to generate explainable stock predictions.
A reflective agent learns how to explain past stock movements through self-reasoning, while the PPO trainer trains the model to generate the most likely explanations.
Our framework can outperform both traditional deep-learning and LLM methods in prediction accuracy and Matthews correlation coefficient.
arXiv Detail & Related papers (2024-02-06T03:18:58Z) - Continual Learning for Large Language Models: A Survey [95.79977915131145]
Large language models (LLMs) are not amenable to frequent re-training, due to high training costs arising from their massive scale.
This paper surveys recent works on continual learning for LLMs.
arXiv Detail & Related papers (2024-02-02T12:34:09Z) - Large Language Model as a Policy Teacher for Training Reinforcement Learning Agents [16.24662355253529]
Large Language Models (LLMs) can address sequential decision-making tasks through the provision of high-level instructions.
LLMs lack specialization in tackling specific target problems, particularly in real-time dynamic environments.
We introduce a novel framework that addresses these challenges by training a smaller, specialized student RL agent using instructions from an LLM-based teacher agent.
arXiv Detail & Related papers (2023-11-22T13:15:42Z) - Knowledge Editing for Large Language Models: A Survey [51.01368551235289]
One major drawback of large language models (LLMs) is their substantial computational cost for pre-training.
Knowledge-based Model Editing (KME) has attracted increasing attention, which aims to precisely modify the LLMs to incorporate specific knowledge.
arXiv Detail & Related papers (2023-10-24T22:18:13Z) - Rethinking Learning Rate Tuning in the Era of Large Language Models [11.87985768634266]
Large Language Models (LLMs) represent the recent success of deep learning in achieving remarkable human-like predictive performance.
It has become a mainstream strategy to leverage fine-tuning to adapt LLMs for various real-world applications.
Existing learning rate policies are primarily designed for training traditional deep neural networks (DNNs)
arXiv Detail & Related papers (2023-09-16T03:37:00Z) - Making Pre-trained Language Models both Task-solvers and
Self-calibrators [52.98858650625623]
Pre-trained language models (PLMs) serve as backbones for various real-world systems.
Previous work shows that introducing an extra calibration task can mitigate this issue.
We propose a training algorithm LM-TOAST to tackle the challenges.
arXiv Detail & Related papers (2023-07-21T02:51:41Z) - ElitePLM: An Empirical Study on General Language Ability Evaluation of
Pretrained Language Models [78.08792285698853]
We present a large-scale empirical study on general language ability evaluation of pretrained language models (ElitePLM)
Our empirical results demonstrate that: (1) PLMs with varying training objectives and strategies are good at different ability tests; (2) fine-tuning PLMs in downstream tasks is usually sensitive to the data size and distribution; and (3) PLMs have excellent transferability between similar tasks.
arXiv Detail & Related papers (2022-05-03T14:18:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.