Related papers: Why LLMs Hallucinate, and How to Get (Evidential) Closure: Perceptual, Intensional, and Extensional Learning for Faithful Natural Language Generation

Why LLMs Hallucinate, and How to Get (Evidential) Closure: Perceptual, Intensional, and Extensional Learning for Faithful Natural Language Generation

URL: http://arxiv.org/abs/2310.15355v1
Date: Mon, 23 Oct 2023 20:35:52 GMT
Title: Why LLMs Hallucinate, and How to Get (Evidential) Closure: Perceptual, Intensional, and Extensional Learning for Faithful Natural Language Generation
Authors: Adam Bouyamourn
Abstract summary: We show that LLMs hallucinate because their output is not constrained to be synonymous with claims for which they have evidence. We then show how to constrain LLMs to produce output that does satisfy evidential closure.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We show that LLMs hallucinate because their output is not constrained to be synonymous with claims for which they have evidence: a condition that we call evidential closure. Information about the truth or falsity of sentences is not statistically identified in the standard neural probabilistic language model setup, and so cannot be conditioned on to generate new strings. We then show how to constrain LLMs to produce output that does satisfy evidential closure. A multimodal LLM must learn about the external world (perceptual learning); it must learn a mapping from strings to states of the world (extensional learning); and, to achieve fluency when generalizing beyond a body of evidence, it must learn mappings from strings to their synonyms (intensional learning). The output of a unimodal LLM must be synonymous with strings in a validated evidence set. Finally, we present a heuristic procedure, Learn-Babble-Prune, that yields faithful output from an LLM by rejecting output that is not synonymous with claims for which the LLM has evidence.

Related papers

Large Language Models Do NOT Really Know What They Don't Know [37.641827402866845]
Recent work suggests that large language models (LLMs) encode factuality signals in their internal representations.<n>LLMs can also produce factual errors by relying on shortcuts or spurious associations.
arXiv Detail & Related papers (2025-10-10T06:09:04Z)
DecoPrompt : Decoding Prompts Reduces Hallucinations when Large Language Models Meet False Premises [28.72485319617863]
We propose a new prompting algorithm, named DecoPrompt, to mitigate hallucination. DecoPrompt leverages LLMs to "decode" the false-premise prompts without really eliciting hallucination output from LLMs. We perform experiments on two datasets, demonstrating that DecoPrompt can reduce hallucinations effectively on outputs from different LLMs.
arXiv Detail & Related papers (2024-11-12T00:48:01Z)
LLMs' Understanding of Natural Language Revealed [0.0]
Large language models (LLMs) are the result of a massive experiment in bottom-up, data-driven reverse engineering of language at scale. We will focus on testing LLMs for their language understanding capabilities, their supposed forte.
arXiv Detail & Related papers (2024-07-29T01:21:11Z)
A Probabilistic Framework for LLM Hallucination Detection via Belief Tree Propagation [72.93327642336078]
We propose Belief Tree Propagation (BTProp), a probabilistic framework for hallucination detection. BTProp introduces a belief tree of logically related statements by decomposing a parent statement into child statements. Our method improves baselines by 3%-9% (evaluated by AUROC and AUC-PR) on multiple hallucination detection benchmarks.
arXiv Detail & Related papers (2024-06-11T05:21:37Z)
Potential and Limitations of LLMs in Capturing Structured Semantics: A Case Study on SRL [78.80673954827773]
Large Language Models (LLMs) play a crucial role in capturing structured semantics to enhance language understanding, improve interpretability, and reduce bias. We propose using Semantic Role Labeling (SRL) as a fundamental task to explore LLMs' ability to extract structured semantics. We find interesting potential: LLMs can indeed capture semantic structures, and scaling-up doesn't always mirror potential. We are surprised to discover that significant overlap in the errors is made by both LLMs and untrained humans, accounting for almost 30% of all errors.
arXiv Detail & Related papers (2024-05-10T11:44:05Z)
FLAME: Factuality-Aware Alignment for Large Language Models [86.76336610282401]
The conventional alignment process fails to enhance the factual accuracy of large language models (LLMs) We identify factors that lead to hallucination in both alignment steps: supervised fine-tuning (SFT) and reinforcement learning (RL) We propose factuality-aware alignment, comprised of factuality-aware SFT and factuality-aware RL through direct preference optimization.
arXiv Detail & Related papers (2024-05-02T17:54:54Z)
Caveat Lector: Large Language Models in Legal Practice [0.0]
The fascination with large language models derives from the fact that many users lack the expertise to evaluate the quality of the generated text. The dangerous combination of fluency and superficial plausibility leads to the temptation to trust the generated text and creates the risk of overreliance.
arXiv Detail & Related papers (2024-03-14T08:19:41Z)
AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations [52.43593893122206]
Alignedcot is an in-context learning technique for invoking Large Language Models. It achieves consistent and correct step-wise prompts in zero-shot scenarios. We conduct experiments on mathematical reasoning and commonsense reasoning.
arXiv Detail & Related papers (2023-11-22T17:24:21Z)
Deceptive Semantic Shortcuts on Reasoning Chains: How Far Can Models Go without Hallucination? [73.454943870226]
This work studies a specific type of hallucination induced by semantic associations. To quantify this phenomenon, we propose a novel probing method and benchmark called EureQA.
arXiv Detail & Related papers (2023-11-16T09:27:36Z)
Assessing the Reliability of Large Language Model Knowledge [78.38870272050106]
Large language models (LLMs) have been treated as knowledge bases due to their strong performance in knowledge probing tasks. How do we evaluate the capabilities of LLMs to consistently produce factually correct answers? We propose MOdel kNowledge relIabiliTy scORe (MONITOR), a novel metric designed to directly measure LLMs' factual reliability.
arXiv Detail & Related papers (2023-10-15T12:40:30Z)
The Internal State of an LLM Knows When It's Lying [18.886091925252174]
Large Language Models (LLMs) have shown exceptional performance in various tasks. One of their most prominent drawbacks is generating inaccurate or false information with a confident tone. We provide evidence that the LLM's internal state can be used to reveal the truthfulness of statements.
arXiv Detail & Related papers (2023-04-26T02:49:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.