Informed Pre-Training on Prior Knowledge
- URL: http://arxiv.org/abs/2205.11433v1
- Date: Mon, 23 May 2022 16:24:40 GMT
- Title: Informed Pre-Training on Prior Knowledge
- Authors: Laura von Rueden, Sebastian Houben, Kostadin Cvejoski, Christian
Bauckhage, Nico Piatkowski
- Abstract summary: When training data is scarce, the incorporation of additional prior knowledge can assist the learning process.
In this paper, we propose a novel informed machine learning approach and suggest to pre-train on prior knowledge.
- Score: 6.666503127282259
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When training data is scarce, the incorporation of additional prior knowledge
can assist the learning process. While it is common to initialize neural
networks with weights that have been pre-trained on other large data sets,
pre-training on more concise forms of knowledge has rather been overlooked. In
this paper, we propose a novel informed machine learning approach and suggest
to pre-train on prior knowledge. Formal knowledge representations, e.g. graphs
or equations, are first transformed into a small and condensed data set of
knowledge prototypes. We show that informed pre-training on such knowledge
prototypes (i) speeds up the learning processes, (ii) improves generalization
capabilities in the regime where not enough training data is available, and
(iii) increases model robustness. Analyzing which parts of the model are
affected most by the prototypes reveals that improvements come from deeper
layers that typically represent high-level features. This confirms that
informed pre-training can indeed transfer semantic knowledge. This is a novel
effect, which shows that knowledge-based pre-training has additional and
complementary strengths to existing approaches.
Related papers
- Why pre-training is beneficial for downstream classification tasks? [32.331679393303446]
We propose to quantitatively and explicitly explain effects of pre-training on the downstream task from a novel game-theoretic view.
Specifically, we extract and quantify the knowledge encoded by the pre-trained model.
We discover that only a small amount of pre-trained model's knowledge is preserved for the inference of downstream tasks.
arXiv Detail & Related papers (2024-10-11T02:13:16Z) - Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models [51.20499954955646]
Large language models (LLMs) acquire vast amounts of knowledge from extensive text corpora during the pretraining phase.
In later stages such as fine-tuning and inference, the model may encounter knowledge not covered in the initial training.
We propose a two-stage fine-tuning strategy to improve the model's overall test accuracy and knowledge retention.
arXiv Detail & Related papers (2024-10-08T08:35:16Z) - Step Out and Seek Around: On Warm-Start Training with Incremental Data [28.85668076145673]
Data often arrives in sequence over time in real-world deep learning applications such as autonomous driving.
Warm-starting from a previously trained checkpoint is the most intuitive way to retain knowledge and advance learning.
We propose Knowledge Consolidation and Acquisition (CKCA), a continuous model improvement algorithm with two novel components.
arXiv Detail & Related papers (2024-06-06T20:12:55Z) - An Emulator for Fine-Tuning Large Language Models using Small Language
Models [91.02498576056057]
We introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates the result of pre-training and fine-tuning at different scales.
We show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training.
Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models.
arXiv Detail & Related papers (2023-10-19T17:57:16Z) - PILOT: A Pre-Trained Model-Based Continual Learning Toolbox [65.57123249246358]
This paper introduces a pre-trained model-based continual learning toolbox known as PILOT.
On the one hand, PILOT implements some state-of-the-art class-incremental learning algorithms based on pre-trained models, such as L2P, DualPrompt, and CODA-Prompt.
On the other hand, PILOT fits typical class-incremental learning algorithms within the context of pre-trained models to evaluate their effectiveness.
arXiv Detail & Related papers (2023-09-13T17:55:11Z) - Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative
Priors [59.93972277761501]
We show that we can learn highly informative posteriors from the source task, through supervised or self-supervised approaches.
This simple modular approach enables significant performance gains and more data-efficient learning on a variety of downstream classification and segmentation tasks.
arXiv Detail & Related papers (2022-05-20T16:19:30Z) - Knowledge Distillation as Efficient Pre-training: Faster Convergence,
Higher Data-efficiency, and Better Transferability [53.27240222619834]
Knowledge Distillation as Efficient Pre-training aims to efficiently transfer the learned feature representation from pre-trained models to new student models for future downstream tasks.
Our method performs comparably with supervised pre-training counterparts in 3 downstream tasks and 9 downstream datasets requiring 10x less data and 5x less pre-training time.
arXiv Detail & Related papers (2022-03-10T06:23:41Z) - Does Pre-training Induce Systematic Inference? How Masked Language
Models Acquire Commonsense Knowledge [91.15301779076187]
We introduce verbalized knowledge into the minibatches of a BERT model during pre-training and evaluate how well the model generalizes to supported inferences.
We find generalization does not improve over the course of pre-training, suggesting that commonsense knowledge is acquired from surface-level, co-occurrence patterns rather than induced, systematic reasoning.
arXiv Detail & Related papers (2021-12-16T03:13:04Z) - DKPLM: Decomposable Knowledge-enhanced Pre-trained Language Model for
Natural Language Understanding [19.478288026844893]
Knowledge-Enhanced Pre-trained Language Models (KEPLMs) are pre-trained models with relation triples injecting from knowledge graphs to improve language understanding abilities.
Previous studies integrate models with knowledge encoders for representing knowledge retrieved from knowledge graphs.
We propose a novel KEPLM named DKPLM that Decomposes Knowledge injection process of the Pre-trained Language Models in pre-training, fine-tuning and inference stages.
arXiv Detail & Related papers (2021-12-02T08:19:42Z) - The Role of Bio-Inspired Modularity in General Learning [0.0]
One goal of general intelligence is to learn novel information without overwriting prior learning.
bootstrapping previous knowledge may allow for faster learning of a novel task.
modularity may offer a solution to weight-update learning methods that adheres to the learning without catastrophic forgetting and bootstrapping constraints.
arXiv Detail & Related papers (2021-09-23T18:45:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.