Pre-training Language Models with Deterministic Factual Knowledge
- URL: http://arxiv.org/abs/2210.11165v1
- Date: Thu, 20 Oct 2022 11:04:09 GMT
- Title: Pre-training Language Models with Deterministic Factual Knowledge
- Authors: Shaobo Li, Xiaoguang Li, Lifeng Shang, Chengjie Sun, Bingquan Liu,
Zhenzhou Ji, Xin Jiang and Qun Liu
- Abstract summary: We propose to let PLMs learn the deterministic relationship between the remaining context and the masked content.
Two pre-training tasks are introduced to motivate PLMs to rely on the deterministic relationship when filling masks.
Experiments indicate that the continuously pre-trained PLMs achieve better robustness in factual knowledge capturing.
- Score: 42.812774794720895
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Previous works show that Pre-trained Language Models (PLMs) can capture
factual knowledge. However, some analyses reveal that PLMs fail to perform it
robustly, e.g., being sensitive to the changes of prompts when extracting
factual knowledge. To mitigate this issue, we propose to let PLMs learn the
deterministic relationship between the remaining context and the masked
content. The deterministic relationship ensures that the masked factual content
can be deterministically inferable based on the existing clues in the context.
That would provide more stable patterns for PLMs to capture factual knowledge
than randomly masking. Two pre-training tasks are further introduced to
motivate PLMs to rely on the deterministic relationship when filling masks.
Specifically, we use an external Knowledge Base (KB) to identify deterministic
relationships and continuously pre-train PLMs with the proposed methods. The
factual knowledge probing experiments indicate that the continuously
pre-trained PLMs achieve better robustness in factual knowledge capturing.
Further experiments on question-answering datasets show that trying to learn a
deterministic relationship with the proposed methods can also help other
knowledge-intensive tasks.
Related papers
- UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models [41.67393607081513]
Large Language Models (LLMs) often struggle to accurately express the factual knowledge they possess.
We propose the UAlign framework, which leverages Uncertainty estimations to represent knowledge boundaries.
We show that the proposed UAlign can significantly enhance the LLMs' capacities to confidently answer known questions.
arXiv Detail & Related papers (2024-12-16T14:14:27Z) - Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.
We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.
We propose a simple yet effective LLM prompting method that outperforms all other tested methods on our benchmark.
arXiv Detail & Related papers (2024-10-24T17:56:08Z) - Analysing the Residual Stream of Language Models Under Knowledge Conflicts [23.96385393039587]
Large language models (LLMs) can store a significant amount of factual knowledge in their parameters.
However, their parametric knowledge may conflict with the information provided in the context.
This can lead to undesirable model behaviour, such as reliance on outdated or incorrect information.
arXiv Detail & Related papers (2024-10-21T15:12:51Z) - Understanding the Relationship between Prompts and Response Uncertainty in Large Language Models [55.332004960574004]
Large language models (LLMs) are widely used in decision-making, but their reliability, especially in critical tasks like healthcare, is not well-established.
This paper investigates how the uncertainty of responses generated by LLMs relates to the information provided in the input prompt.
We propose a prompt-response concept model that explains how LLMs generate responses and helps understand the relationship between prompts and response uncertainty.
arXiv Detail & Related papers (2024-07-20T11:19:58Z) - Uncertainty Quantification for In-Context Learning of Large Language Models [52.891205009620364]
In-context learning has emerged as a groundbreaking ability of Large Language Models (LLMs)
We propose a novel formulation and corresponding estimation method to quantify both types of uncertainties.
The proposed method offers an unsupervised way to understand the prediction of in-context learning in a plug-and-play fashion.
arXiv Detail & Related papers (2024-02-15T18:46:24Z) - Give Me the Facts! A Survey on Factual Knowledge Probing in Pre-trained
Language Models [2.3981254787726067]
Pre-trained Language Models (PLMs) are trained on vast unlabeled data, rich in world knowledge.
This has sparked the interest of the community in quantifying the amount of factual knowledge present in PLMs.
In this work, we survey methods and datasets that are used to probe PLMs for factual knowledge.
arXiv Detail & Related papers (2023-10-25T11:57:13Z) - Knowledge Rumination for Pre-trained Language Models [77.55888291165462]
We propose a new paradigm dubbed Knowledge Rumination to help the pre-trained language model utilize related latent knowledge without retrieving it from the external corpus.
We apply the proposed knowledge rumination to various language models, including RoBERTa, DeBERTa, and GPT-3.
arXiv Detail & Related papers (2023-05-15T15:47:09Z) - Knowledgeable Salient Span Mask for Enhancing Language Models as
Knowledge Base [51.55027623439027]
We develop two solutions to help the model learn more knowledge from unstructured text in a fully self-supervised manner.
To our best knowledge, we are the first to explore fully self-supervised learning of knowledge in continual pre-training.
arXiv Detail & Related papers (2022-04-17T12:33:34Z) - RuleBert: Teaching Soft Rules to Pre-trained Language Models [21.69870624809201]
We introduce a classification task where, given facts and soft rules, the PLM should return a prediction with a probability for a given hypothesis.
We propose a revised loss function that enables the PLM to learn how to predict precise probabilities for the task.
Our evaluation results show that the resulting fine-tuned models achieve very high performance, even on logical rules that were unseen at training.
arXiv Detail & Related papers (2021-09-24T16:19:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.