SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for
Generative Large Language Models
- URL: http://arxiv.org/abs/2303.08896v3
- Date: Wed, 11 Oct 2023 17:43:28 GMT
- Title: SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for
Generative Large Language Models
- Authors: Potsawee Manakul, Adian Liusie, Mark J. F. Gales
- Abstract summary: "SelfCheckGPT" is a simple sampling-based approach to fact-check the responses of black-box models.
We investigate this approach by using GPT-3 to generate passages about individuals from the WikiBio dataset.
- Score: 55.60306377044225
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative Large Language Models (LLMs) such as GPT-3 are capable of
generating highly fluent responses to a wide variety of user prompts. However,
LLMs are known to hallucinate facts and make non-factual statements which can
undermine trust in their output. Existing fact-checking approaches either
require access to the output probability distribution (which may not be
available for systems such as ChatGPT) or external databases that are
interfaced via separate, often complex, modules. In this work, we propose
"SelfCheckGPT", a simple sampling-based approach that can be used to fact-check
the responses of black-box models in a zero-resource fashion, i.e. without an
external database. SelfCheckGPT leverages the simple idea that if an LLM has
knowledge of a given concept, sampled responses are likely to be similar and
contain consistent facts. However, for hallucinated facts, stochastically
sampled responses are likely to diverge and contradict one another. We
investigate this approach by using GPT-3 to generate passages about individuals
from the WikiBio dataset, and manually annotate the factuality of the generated
passages. We demonstrate that SelfCheckGPT can: i) detect non-factual and
factual sentences; and ii) rank passages in terms of factuality. We compare our
approach to several baselines and show that our approach has considerably
higher AUC-PR scores in sentence-level hallucination detection and higher
correlation scores in passage-level factuality assessment compared to grey-box
methods.
Related papers
- Meaningless is better: hashing bias-inducing words in LLM prompts improves performance in logical reasoning and statistical learning [0.0]
"Hashing" involves masking potentially bias-inducing words in large language models with meaningless identifiers to reduce cognitive biases.
The method was tested across three sets of experiments involving a total of 490 prompts.
Overall, the method was shown to improve bias reduction and incorporation of external knowledge.
arXiv Detail & Related papers (2024-11-26T10:52:08Z) - Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Question Answering [0.0]
Large Language Models (LLM) and Knowledge Graphs (KG) are combined to improve the accuracy and reliability of question-answering systems.
Our method incorporates a query checker that ensures the syntactical and semantic validity of LLM-generated queries.
To make this approach accessible, a user-friendly web-based interface has been developed.
arXiv Detail & Related papers (2024-09-06T10:49:46Z) - Can Language Models Explain Their Own Classification Behavior? [1.8177391253202122]
Large language models (LLMs) perform well at a myriad of tasks, but explaining the processes behind this performance is a challenge.
This paper investigates whether LLMs can give faithful high-level explanations of their own internal processes.
We release our dataset, ArticulateRules, which can be used to test self-explanation for LLMs trained either in-context or by finetuning.
arXiv Detail & Related papers (2024-05-13T02:31:08Z) - LLMAuditor: A Framework for Auditing Large Language Models Using Human-in-the-Loop [7.77005079649294]
An effective method is to probe the Large Language Models using different versions of the same question.
To operationalize this auditing method at scale, we need an approach to create those probes reliably and automatically.
We propose the LLMAuditor framework, where one uses a different LLM along with human-in-the-loop (HIL)
This approach offers verifiability and transparency, while avoiding circular reliance on the same LLM.
arXiv Detail & Related papers (2024-02-14T17:49:31Z) - Factcheck-Bench: Fine-Grained Evaluation Benchmark for Automatic Fact-checkers [121.53749383203792]
We present a holistic end-to-end solution for annotating the factuality of large language models (LLMs)-generated responses.
We construct an open-domain document-level factuality benchmark in three-level granularity: claim, sentence and document.
Preliminary experiments show that FacTool, FactScore and Perplexity are struggling to identify false claims.
arXiv Detail & Related papers (2023-11-15T14:41:57Z) - BOOST: Harnessing Black-Box Control to Boost Commonsense in LMs'
Generation [60.77990074569754]
We present a computation-efficient framework that steers a frozen Pre-Trained Language Model towards more commonsensical generation.
Specifically, we first construct a reference-free evaluator that assigns a sentence with a commonsensical score.
We then use the scorer as the oracle for commonsense knowledge, and extend the controllable generation method called NADO to train an auxiliary head.
arXiv Detail & Related papers (2023-10-25T23:32:12Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - Guiding Large Language Models via Directional Stimulus Prompting [114.84930073977672]
We introduce Directional Stimulus Prompting, a novel framework for guiding black-box large language models (LLMs) toward specific desired outputs.
Instead of directly adjusting LLMs, our method employs a small tunable policy model to generate an auxiliary directional stimulus prompt for each input instance.
arXiv Detail & Related papers (2023-02-22T17:44:15Z) - Recitation-Augmented Language Models [85.30591349383849]
We show that RECITE is a powerful paradigm for knowledge-intensive NLP tasks.
Specifically, we show that by utilizing recitation as the intermediate step, a recite-and-answer scheme can achieve new state-of-the-art performance.
arXiv Detail & Related papers (2022-10-04T00:49:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.