LLMs' Reading Comprehension Is Affected by Parametric Knowledge and Struggles with Hypothetical Statements
- URL: http://arxiv.org/abs/2404.06283v1
- Date: Tue, 9 Apr 2024 13:08:56 GMT
- Title: LLMs' Reading Comprehension Is Affected by Parametric Knowledge and Struggles with Hypothetical Statements
- Authors: Victoria Basmov, Yoav Goldberg, Reut Tsarfaty,
- Abstract summary: Task of reading comprehension (RC) provides a primary means to assess language models' natural language understanding (NLU) capabilities.
If the context aligns with the models' internal knowledge, it is hard to discern whether the models' answers stem from context comprehension or from internal information.
To address this issue, we suggest to use RC on imaginary data, based on fictitious facts and entities.
- Score: 59.71218039095155
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The task of reading comprehension (RC), often implemented as context-based question answering (QA), provides a primary means to assess language models' natural language understanding (NLU) capabilities. Yet, when applied to large language models (LLMs) with extensive built-in world knowledge, this method can be deceptive. If the context aligns with the LLMs' internal knowledge, it is hard to discern whether the models' answers stem from context comprehension or from LLMs' internal information. Conversely, using data that conflicts with the models' knowledge creates erroneous trends which distort the results. To address this issue, we suggest to use RC on imaginary data, based on fictitious facts and entities. This task is entirely independent of the models' world knowledge, enabling us to evaluate LLMs' linguistic abilities without the interference of parametric knowledge. Testing ChatGPT, GPT-4, LLaMA 2 and Mixtral on such imaginary data, we uncover a class of linguistic phenomena posing a challenge to current LLMs, involving thinking in terms of alternative, hypothetical scenarios. While all the models handle simple affirmative and negative contexts with high accuracy, they are much more prone to error when dealing with modal and conditional contexts. Crucially, these phenomena also trigger the LLMs' vulnerability to knowledge-conflicts again. In particular, while some models prove virtually unaffected by knowledge conflicts in affirmative and negative contexts, when faced with more semantically involved modal and conditional environments, they often fail to separate the text from their internal knowledge.
Related papers
- Knowledge Graphs, Large Language Models, and Hallucinations: An NLP Perspective [5.769786334333616]
Large Language Models (LLMs) have revolutionized Natural Language Processing (NLP) based applications including automated text generation, question answering, and others.
They face a significant challenge: hallucinations, where models produce plausible-sounding but factually incorrect responses.
This paper discusses these open challenges covering state-of-the-art datasets and benchmarks as well as methods for knowledge integration and evaluating hallucinations.
arXiv Detail & Related papers (2024-11-21T16:09:05Z) - Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning [26.861562920084264]
Large language models (LLMs) are applied across diverse domains.
We propose a novel method termed in-context knowledge unlearning''
Our method fine-tunes pre-trained LLMs to enable prompt unlearning of target knowledge within the context.
arXiv Detail & Related papers (2024-10-01T04:13:25Z) - Misinforming LLMs: vulnerabilities, challenges and opportunities [4.54019093815234]
Large Language Models (LLMs) have made significant advances in natural language processing, but their underlying mechanisms are often misunderstood.
This paper argues that current LLM architectures are inherently untrustworthy due to their reliance on correlations of sequential patterns of word embedding vectors.
Research into combining generative transformer-based models with fact bases and logic programming languages may lead to the development of trustworthy LLMs.
arXiv Detail & Related papers (2024-08-02T10:35:49Z) - Towards Logically Consistent Language Models via Probabilistic Reasoning [14.317886666902822]
Large language models (LLMs) are a promising venue for natural language understanding and generation tasks.
LLMs are prone to generate non-factual information and to contradict themselves when prompted to reason about beliefs of the world.
We introduce a training objective that teaches a LLM to be consistent with external knowledge in the form of a set of facts and rules.
arXiv Detail & Related papers (2024-04-19T12:23:57Z) - Untangle the KNOT: Interweaving Conflicting Knowledge and Reasoning Skills in Large Language Models [51.72963030032491]
Knowledge documents for large language models (LLMs) may conflict with the memory of LLMs due to outdated or incorrect knowledge.
We construct a new dataset, dubbed KNOT, for knowledge conflict resolution examination in the form of question answering.
arXiv Detail & Related papers (2024-04-04T16:40:11Z) - Robust and Scalable Model Editing for Large Language Models [75.95623066605259]
We propose EREN (Edit models by REading Notes) to improve the scalability and robustness of LLM editing.
Unlike existing techniques, it can integrate knowledge from multiple edits, and correctly respond to syntactically similar but semantically unrelated inputs.
arXiv Detail & Related papers (2024-03-26T06:57:23Z) - A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia [57.31074448586854]
Large language models (LLMs) have an impressive ability to draw on novel information supplied in their context.
Yet the mechanisms underlying this contextual grounding remain unknown.
We present a novel method to study grounding abilities using Fakepedia.
arXiv Detail & Related papers (2023-12-04T17:35:42Z) - RECALL: A Benchmark for LLMs Robustness against External Counterfactual
Knowledge [69.79676144482792]
This study aims to evaluate the ability of LLMs to distinguish reliable information from external knowledge.
Our benchmark consists of two tasks, Question Answering and Text Generation, and for each task, we provide models with a context containing counterfactual information.
arXiv Detail & Related papers (2023-11-14T13:24:19Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.