Pre-Trained Models: Past, Present and Future
- URL: http://arxiv.org/abs/2106.07139v2
- Date: Tue, 15 Jun 2021 07:08:31 GMT
- Title: Pre-Trained Models: Past, Present and Future
- Authors: Xu Han, Zhengyan Zhang, Ning Ding, Yuxian Gu, Xiao Liu, Yuqi Huo,
Jiezhong Qiu, Liang Zhang, Wentao Han, Minlie Huang, Qin Jin, Yanyan Lan,
Yang Liu, Zhiyuan Liu, Zhiwu Lu, Xipeng Qiu, Ruihua Song, Jie Tang, Ji-Rong
Wen, Jinhui Yuan, Wayne Xin Zhao, Jun Zhu
- Abstract summary: Large-scale pre-trained models (PTMs) have recently achieved great success and become a milestone in the field of artificial intelligence (AI)
By storing knowledge into huge parameters and fine-tuning on specific tasks, the rich knowledge implicitly encoded in huge parameters can benefit a variety of downstream tasks.
It is now the consensus of the AI community to adopt PTMs as backbone for downstream tasks rather than learning models from scratch.
- Score: 126.21572378910746
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-scale pre-trained models (PTMs) such as BERT and GPT have recently
achieved great success and become a milestone in the field of artificial
intelligence (AI). Owing to sophisticated pre-training objectives and huge
model parameters, large-scale PTMs can effectively capture knowledge from
massive labeled and unlabeled data. By storing knowledge into huge parameters
and fine-tuning on specific tasks, the rich knowledge implicitly encoded in
huge parameters can benefit a variety of downstream tasks, which has been
extensively demonstrated via experimental verification and empirical analysis.
It is now the consensus of the AI community to adopt PTMs as backbone for
downstream tasks rather than learning models from scratch. In this paper, we
take a deep look into the history of pre-training, especially its special
relation with transfer learning and self-supervised learning, to reveal the
crucial position of PTMs in the AI development spectrum. Further, we
comprehensively review the latest breakthroughs of PTMs. These breakthroughs
are driven by the surge of computational power and the increasing availability
of data, towards four important directions: designing effective architectures,
utilizing rich contexts, improving computational efficiency, and conducting
interpretation and theoretical analysis. Finally, we discuss a series of open
problems and research directions of PTMs, and hope our view can inspire and
advance the future study of PTMs.
Related papers
- Intellectual Property Protection for Deep Learning Model and Dataset Intelligence [21.757997058357]
This work systematically summarizes the general and scheme-specific performance evaluation metrics.
From proactive IP infringement prevention and reactive IP ownership verification perspectives, it comprehensively investigates and analyzes the existing IPP methods.
Finally, we outline prospects for promising future directions that may act as a guide for innovative research.
arXiv Detail & Related papers (2024-11-07T09:02:41Z) - Long Term Memory: The Foundation of AI Self-Evolution [48.52678410533424]
Large language models (LLMs) like GPTs, trained on vast datasets, have demonstrated impressive capabilities in language understanding, reasoning, and planning.
Most studies focus on enhancing these models by training on ever-larger datasets to build more powerful foundation models.
Unlike large-scale training, enabling models to evolve during inference is equally crucial, a process we refer to as AI self-evolution.
arXiv Detail & Related papers (2024-10-21T06:09:30Z) - Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective [60.64922606733441]
We introduce a mathematical model that formalizes relational learning as hypergraph recovery to study pre-training of Foundation Models (FMs)
In our framework, the world is represented as a hypergraph, with data abstracted as random samples from hyperedges. We theoretically examine the feasibility of a Pre-Trained Model (PTM) to recover this hypergraph and analyze the data efficiency in a minimax near-optimal style.
arXiv Detail & Related papers (2024-06-17T06:20:39Z) - Integrating LSTM and BERT for Long-Sequence Data Analysis in Intelligent Tutoring Systems [4.359769884713738]
We propose a LSTM BERT-based Knowledge Tracing model for long sequence data processing, namely LBKT.
The results indicate that LBKT is faster, more interpretable, and has a lower memory cost than the traditional deep learning based Knowledge Tracing methods.
arXiv Detail & Related papers (2024-04-24T18:19:44Z) - SPOT: Scalable 3D Pre-training via Occupancy Prediction for Learning Transferable 3D Representations [76.45009891152178]
Pretraining-finetuning approach can alleviate the labeling burden by fine-tuning a pre-trained backbone across various downstream datasets as well as tasks.
We show, for the first time, that general representations learning can be achieved through the task of occupancy prediction.
Our findings will facilitate the understanding of LiDAR points and pave the way for future advancements in LiDAR pre-training.
arXiv Detail & Related papers (2023-09-19T11:13:01Z) - PILOT: A Pre-Trained Model-Based Continual Learning Toolbox [71.63186089279218]
This paper introduces a pre-trained model-based continual learning toolbox known as PILOT.
On the one hand, PILOT implements some state-of-the-art class-incremental learning algorithms based on pre-trained models, such as L2P, DualPrompt, and CODA-Prompt.
On the other hand, PILOT fits typical class-incremental learning algorithms within the context of pre-trained models to evaluate their effectiveness.
arXiv Detail & Related papers (2023-09-13T17:55:11Z) - On the Predictive Accuracy of Neural Temporal Point Process Models for
Continuous-time Event Data [3.13468877208035]
Temporal Point Processes (TPPs) serve as the standard mathematical framework for modeling asynchronous event sequences in continuous time.
Researchers have proposed Neural TPPs, which leverage neural network parametrizations to offer more flexible and efficient modeling.
This study systematically evaluates the predictive accuracy of state-of-the-art neural TPP models.
arXiv Detail & Related papers (2023-06-29T16:14:43Z) - Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey [66.18478838828231]
Multi-modal pre-trained big models have drawn more and more attention in recent years.
This paper introduces the background of multi-modal pre-training by reviewing the conventional deep, pre-training works in natural language process, computer vision, and speech.
Then, we introduce the task definition, key challenges, and advantages of multi-modal pre-training models (MM-PTMs), and discuss the MM-PTMs with a focus on data, objectives, network, and knowledge enhanced pre-training.
arXiv Detail & Related papers (2023-02-20T15:34:03Z) - PI-QT-Opt: Predictive Information Improves Multi-Task Robotic
Reinforcement Learning at Scale [14.444439310266873]
Predictive Information QT-Opt learns representations of the predictive information to solve up to 297 vision-based robot manipulation tasks in simulation and the real world.
We demonstrate that modeling the predictive information significantly improves success rates on the training tasks and leads to better zero-shot transfer to unseen novel tasks.
arXiv Detail & Related papers (2022-10-15T07:30:31Z) - Do we need to go Deep? Knowledge Tracing with Big Data [5.218882272051637]
We use EdNet, the largest student interaction dataset publicly available in the education domain, to understand how accurately both deep and traditional models predict future student performances.
Our work observes that logistic regression models with carefully engineered features outperformed deep models from extensive experimentation.
arXiv Detail & Related papers (2021-01-20T22:40:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.