Pre-training Language Models with Deterministic Factual Knowledge
- URL: http://arxiv.org/abs/2210.11165v1
- Date: Thu, 20 Oct 2022 11:04:09 GMT
- Title: Pre-training Language Models with Deterministic Factual Knowledge
- Authors: Shaobo Li, Xiaoguang Li, Lifeng Shang, Chengjie Sun, Bingquan Liu,
Zhenzhou Ji, Xin Jiang and Qun Liu
- Abstract summary: We propose to let PLMs learn the deterministic relationship between the remaining context and the masked content.
Two pre-training tasks are introduced to motivate PLMs to rely on the deterministic relationship when filling masks.
Experiments indicate that the continuously pre-trained PLMs achieve better robustness in factual knowledge capturing.
- Score: 42.812774794720895
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Previous works show that Pre-trained Language Models (PLMs) can capture
factual knowledge. However, some analyses reveal that PLMs fail to perform it
robustly, e.g., being sensitive to the changes of prompts when extracting
factual knowledge. To mitigate this issue, we propose to let PLMs learn the
deterministic relationship between the remaining context and the masked
content. The deterministic relationship ensures that the masked factual content
can be deterministically inferable based on the existing clues in the context.
That would provide more stable patterns for PLMs to capture factual knowledge
than randomly masking. Two pre-training tasks are further introduced to
motivate PLMs to rely on the deterministic relationship when filling masks.
Specifically, we use an external Knowledge Base (KB) to identify deterministic
relationships and continuously pre-train PLMs with the proposed methods. The
factual knowledge probing experiments indicate that the continuously
pre-trained PLMs achieve better robustness in factual knowledge capturing.
Further experiments on question-answering datasets show that trying to learn a
deterministic relationship with the proposed methods can also help other
knowledge-intensive tasks.
Related papers
- Analysing the Residual Stream of Language Models Under Knowledge Conflicts [23.96385393039587]
Large language models (LLMs) can store a significant amount of factual knowledge in their parameters.
However, their parametric knowledge may conflict with the information provided in the context.
This can lead to undesirable model behaviour, such as reliance on outdated or incorrect information.
arXiv Detail & Related papers (2024-10-21T15:12:51Z) - Understanding the Relationship between Prompts and Response Uncertainty in Large Language Models [55.332004960574004]
Large language models (LLMs) are widely used in decision-making, but their reliability, especially in critical tasks like healthcare, is not well-established.
This paper investigates how the uncertainty of responses generated by LLMs relates to the information provided in the input prompt.
We propose a prompt-response concept model that explains how LLMs generate responses and helps understand the relationship between prompts and response uncertainty.
arXiv Detail & Related papers (2024-07-20T11:19:58Z) - Uncertainty Quantification for In-Context Learning of Large Language Models [52.891205009620364]
In-context learning has emerged as a groundbreaking ability of Large Language Models (LLMs)
We propose a novel formulation and corresponding estimation method to quantify both types of uncertainties.
The proposed method offers an unsupervised way to understand the prediction of in-context learning in a plug-and-play fashion.
arXiv Detail & Related papers (2024-02-15T18:46:24Z) - The Queen of England is not England's Queen: On the Lack of Factual
Coherency in PLMs [2.9443699603751536]
Factual knowledge encoded in Pre-trained Language Models (PLMs) enriches their representations and justifies their use as knowledge bases.
Previous work has focused on probing PLMs for factual knowledge by measuring how often they can correctly predict an object entity given a subject and a relation.
In this work, we consider a complementary aspect, namely the coherency of factual knowledge in PLMs, i.e., how often can PLMs predict the subject entity given its initial prediction of the object entity.
arXiv Detail & Related papers (2024-02-02T14:42:09Z) - Give Me the Facts! A Survey on Factual Knowledge Probing in Pre-trained
Language Models [2.3981254787726067]
Pre-trained Language Models (PLMs) are trained on vast unlabeled data, rich in world knowledge.
This has sparked the interest of the community in quantifying the amount of factual knowledge present in PLMs.
In this work, we survey methods and datasets that are used to probe PLMs for factual knowledge.
arXiv Detail & Related papers (2023-10-25T11:57:13Z) - Knowledge Rumination for Pre-trained Language Models [77.55888291165462]
We propose a new paradigm dubbed Knowledge Rumination to help the pre-trained language model utilize related latent knowledge without retrieving it from the external corpus.
We apply the proposed knowledge rumination to various language models, including RoBERTa, DeBERTa, and GPT-3.
arXiv Detail & Related papers (2023-05-15T15:47:09Z) - Context-faithful Prompting for Large Language Models [51.194410884263135]
Large language models (LLMs) encode parametric knowledge about world facts.
Their reliance on parametric knowledge may cause them to overlook contextual cues, leading to incorrect predictions in context-sensitive NLP tasks.
We assess and enhance LLMs' contextual faithfulness in two aspects: knowledge conflict and prediction with abstention.
arXiv Detail & Related papers (2023-03-20T17:54:58Z) - Knowledgeable Salient Span Mask for Enhancing Language Models as
Knowledge Base [51.55027623439027]
We develop two solutions to help the model learn more knowledge from unstructured text in a fully self-supervised manner.
To our best knowledge, we are the first to explore fully self-supervised learning of knowledge in continual pre-training.
arXiv Detail & Related papers (2022-04-17T12:33:34Z) - RuleBert: Teaching Soft Rules to Pre-trained Language Models [21.69870624809201]
We introduce a classification task where, given facts and soft rules, the PLM should return a prediction with a probability for a given hypothesis.
We propose a revised loss function that enables the PLM to learn how to predict precise probabilities for the task.
Our evaluation results show that the resulting fine-tuned models achieve very high performance, even on logical rules that were unseen at training.
arXiv Detail & Related papers (2021-09-24T16:19:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.