SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for
Generative Large Language Models
- URL: http://arxiv.org/abs/2303.08896v3
- Date: Wed, 11 Oct 2023 17:43:28 GMT
- Title: SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for
Generative Large Language Models
- Authors: Potsawee Manakul, Adian Liusie, Mark J. F. Gales
- Abstract summary: "SelfCheckGPT" is a simple sampling-based approach to fact-check the responses of black-box models.
We investigate this approach by using GPT-3 to generate passages about individuals from the WikiBio dataset.
- Score: 55.60306377044225
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative Large Language Models (LLMs) such as GPT-3 are capable of
generating highly fluent responses to a wide variety of user prompts. However,
LLMs are known to hallucinate facts and make non-factual statements which can
undermine trust in their output. Existing fact-checking approaches either
require access to the output probability distribution (which may not be
available for systems such as ChatGPT) or external databases that are
interfaced via separate, often complex, modules. In this work, we propose
"SelfCheckGPT", a simple sampling-based approach that can be used to fact-check
the responses of black-box models in a zero-resource fashion, i.e. without an
external database. SelfCheckGPT leverages the simple idea that if an LLM has
knowledge of a given concept, sampled responses are likely to be similar and
contain consistent facts. However, for hallucinated facts, stochastically
sampled responses are likely to diverge and contradict one another. We
investigate this approach by using GPT-3 to generate passages about individuals
from the WikiBio dataset, and manually annotate the factuality of the generated
passages. We demonstrate that SelfCheckGPT can: i) detect non-factual and
factual sentences; and ii) rank passages in terms of factuality. We compare our
approach to several baselines and show that our approach has considerably
higher AUC-PR scores in sentence-level hallucination detection and higher
correlation scores in passage-level factuality assessment compared to grey-box
methods.
Related papers
- CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models [19.209135063841895]
This work proposes "CrossCheckGPT", a reference-free universal hallucination ranking for multimodal foundation models.
The core idea of CrossCheckGPT is that the same hallucinated content is unlikely to be generated by different independent systems.
We showcase the applicability of our method for hallucination ranking across various modalities, namely the text, image, and audio-visual domains.
arXiv Detail & Related papers (2024-05-22T14:25:41Z) - Can Language Models Explain Their Own Classification Behavior? [1.8177391253202122]
Large language models (LLMs) perform well at a myriad of tasks, but explaining the processes behind this performance is a challenge.
This paper investigates whether LLMs can give faithful high-level explanations of their own internal processes.
We release our dataset, ArticulateRules, which can be used to test self-explanation for LLMs trained either in-context or by finetuning.
arXiv Detail & Related papers (2024-05-13T02:31:08Z) - Comparing Plausibility Estimates in Base and Instruction-Tuned Large Language Models [50.15455336684986]
We compare base and instruction-tuned LLM performance on an English sentence plausibility task via explicit prompting and implicit estimation.
Experiment 1 shows that, across model architectures and plausibility datasets, log likelihood ($textitLL$) scores are the most reliable indicator of sentence plausibility.
Experiment 2 shows that $textitLL$ scores across models are modulated by context in the expected way, showing high performance on three metrics of context-sensitive plausibility.
arXiv Detail & Related papers (2024-03-21T22:08:44Z) - LLMAuditor: A Framework for Auditing Large Language Models Using Human-in-the-Loop [7.77005079649294]
An effective method is to probe the Large Language Models using different versions of the same question.
To operationalize this auditing method at scale, we need an approach to create those probes reliably and automatically.
We propose the LLMAuditor framework, where one uses a different LLM along with human-in-the-loop (HIL)
This approach offers verifiability and transparency, while avoiding circular reliance on the same LLM.
arXiv Detail & Related papers (2024-02-14T17:49:31Z) - Factcheck-Bench: Fine-Grained Evaluation Benchmark for Automatic Fact-checkers [121.53749383203792]
We present a holistic end-to-end solution for annotating the factuality of large language models (LLMs)-generated responses.
We construct an open-domain document-level factuality benchmark in three-level granularity: claim, sentence and document.
Preliminary experiments show that FacTool, FactScore and Perplexity are struggling to identify false claims.
arXiv Detail & Related papers (2023-11-15T14:41:57Z) - BOOST: Harnessing Black-Box Control to Boost Commonsense in LMs'
Generation [60.77990074569754]
We present a computation-efficient framework that steers a frozen Pre-Trained Language Model towards more commonsensical generation.
Specifically, we first construct a reference-free evaluator that assigns a sentence with a commonsensical score.
We then use the scorer as the oracle for commonsense knowledge, and extend the controllable generation method called NADO to train an auxiliary head.
arXiv Detail & Related papers (2023-10-25T23:32:12Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - Recitation-Augmented Language Models [85.30591349383849]
We show that RECITE is a powerful paradigm for knowledge-intensive NLP tasks.
Specifically, we show that by utilizing recitation as the intermediate step, a recite-and-answer scheme can achieve new state-of-the-art performance.
arXiv Detail & Related papers (2022-10-04T00:49:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.