The Effect of Scaling, Retrieval Augmentation and Form on the Factual
Consistency of Language Models
- URL: http://arxiv.org/abs/2311.01307v1
- Date: Thu, 2 Nov 2023 15:20:11 GMT
- Title: The Effect of Scaling, Retrieval Augmentation and Form on the Factual
Consistency of Language Models
- Authors: Lovisa Hagstr\"om and Denitsa Saynova and Tobias Norlund and Moa
Johansson and Richard Johansson
- Abstract summary: Large Language Models (LLMs) make natural interfaces to factual knowledge, but their usefulness is limited by their tendency to deliver inconsistent answers to semantically equivalent questions.
We identify potential causes of inconsistency and evaluate the effectiveness of two mitigation strategies: up-scaling and augmenting the LM with a retrieval corpus.
- Score: 7.2022332882214695
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) make natural interfaces to factual knowledge,
but their usefulness is limited by their tendency to deliver inconsistent
answers to semantically equivalent questions. For example, a model might
predict both "Anne Redpath passed away in Edinburgh." and "Anne Redpath's life
ended in London." In this work, we identify potential causes of inconsistency
and evaluate the effectiveness of two mitigation strategies: up-scaling and
augmenting the LM with a retrieval corpus. Our results on the LLaMA and Atlas
models show that both strategies reduce inconsistency while retrieval
augmentation is considerably more efficient. We further consider and
disentangle the consistency contributions of different components of Atlas. For
all LMs evaluated we find that syntactical form and other evaluation task
artifacts impact consistency. Taken together, our results provide a better
understanding of the factors affecting the factual consistency of language
models.
Related papers
- Quantifying In-Context Reasoning Effects and Memorization Effects in LLMs [101.51435599249234]
We propose an axiomatic system to define and quantify the precise memorization and in-context reasoning effects used by the large language model (LLM)
Specifically, the axiomatic system enables us to categorize the memorization effects into foundational memorization effects and chaotic memorization effects.
Experiments show that the clear disentanglement of memorization effects and in-context reasoning effects enables a straightforward examination of detailed inference patterns encoded by LLMs.
arXiv Detail & Related papers (2024-05-20T08:51:03Z) - Evaluating Consistency and Reasoning Capabilities of Large Language Models [0.0]
Large Language Models (LLMs) are extensively used today across various sectors, including academia, research, business, and finance.
Despite their widespread adoption, these models often produce incorrect and misleading information, exhibiting a tendency to hallucinate.
This paper aims to evaluate and compare the consistency and reasoning capabilities of both public and proprietary LLMs.
arXiv Detail & Related papers (2024-04-25T10:03:14Z) - Estimating the Causal Effects of Natural Logic Features in Transformer-Based NLI Models [16.328341121232484]
We apply causal effect estimation strategies to measure the effect of context interventions.
We investigate robustness to irrelevant changes and sensitivity to impactful changes of Transformers.
arXiv Detail & Related papers (2024-04-03T10:22:35Z) - Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models [48.35385912526338]
This paper explores the impact of extending input lengths on the capabilities of Large Language Models (LLMs)
We isolate the effect of input length using multiple versions of the same sample, each being extended with padding of different lengths, types and locations.
We show that the degradation trend appears in every version of our dataset, although at different intensities.
arXiv Detail & Related papers (2024-02-19T16:04:53Z) - Enhancing Large Language Model with Decomposed Reasoning for Emotion
Cause Pair Extraction [13.245873138716044]
Emotion-Cause Pair Extraction (ECPE) involves extracting clause pairs representing emotions and their causes in a document.
Inspired by recent work, we explore leveraging large language model (LLM) to address ECPE task without additional training.
We introduce chain-of-thought to mimic human cognitive process and propose the Decomposed Emotion-Cause Chain (DECC) framework.
arXiv Detail & Related papers (2024-01-31T10:20:01Z) - Sensitivity, Performance, Robustness: Deconstructing the Effect of
Sociodemographic Prompting [64.80538055623842]
sociodemographic prompting is a technique that steers the output of prompt-based models towards answers that humans with specific sociodemographic profiles would give.
We show that sociodemographic information affects model predictions and can be beneficial for improving zero-shot learning in subjective NLP tasks.
arXiv Detail & Related papers (2023-09-13T15:42:06Z) - Improving Open Information Extraction with Large Language Models: A
Study on Demonstration Uncertainty [52.72790059506241]
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text.
Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks.
arXiv Detail & Related papers (2023-09-07T01:35:24Z) - ReAct: Synergizing Reasoning and Acting in Language Models [44.746116256516046]
We show that large language models (LLMs) can generate both reasoning traces and task-specific actions in an interleaved manner.
We apply our approach, named ReAct, to a diverse set of language and decision making tasks.
ReAct overcomes issues of hallucination and error propagation prevalent in chain-of-thought reasoning by interacting with a simple Wikipedia API.
arXiv Detail & Related papers (2022-10-06T01:00:32Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z) - Latent Opinions Transfer Network for Target-Oriented Opinion Words
Extraction [63.70885228396077]
We propose a novel model to transfer opinions knowledge from resource-rich review sentiment classification datasets to low-resource task TOWE.
Our model achieves better performance compared to other state-of-the-art methods and significantly outperforms the base model without transferring opinions knowledge.
arXiv Detail & Related papers (2020-01-07T11:50:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.