Chain-of-Verification Reduces Hallucination in Large Language Models
- URL: http://arxiv.org/abs/2309.11495v2
- Date: Mon, 25 Sep 2023 15:25:49 GMT
- Title: Chain-of-Verification Reduces Hallucination in Large Language Models
- Authors: Shehzaad Dhuliawala, Mojtaba Komeili, Jing Xu, Roberta Raileanu, Xian
Li, Asli Celikyilmaz, Jason Weston
- Abstract summary: We study the ability of language models to deliberate on the responses they give in order to correct their mistakes.
We develop the Chain-of-Verification (CoVe) method whereby the model first drafts an initial response.
We show CoVe decreases hallucinations across a variety of tasks, from list-based questions from Wikidata to closed book MultiSpanQA.
- Score: 80.99318041981776
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generation of plausible yet incorrect factual information, termed
hallucination, is an unsolved issue in large language models. We study the
ability of language models to deliberate on the responses they give in order to
correct their mistakes. We develop the Chain-of-Verification (CoVe) method
whereby the model first (i) drafts an initial response; then (ii) plans
verification questions to fact-check its draft; (iii) answers those questions
independently so the answers are not biased by other responses; and (iv)
generates its final verified response. In experiments, we show CoVe decreases
hallucinations across a variety of tasks, from list-based questions from
Wikidata, closed book MultiSpanQA and longform text generation.
Related papers
- A Unified Hallucination Mitigation Framework for Large Vision-Language Models [18.595958586621943]
We present a unified framework, Dentist, for hallucination mitigation.
The core step is to first classify the queries, then perform different processes of hallucination mitigation based on the classification result.
On MMbench, we achieve a 13.44%/10.2%/15.8% improvement in accuracy on Image Quality.
arXiv Detail & Related papers (2024-09-24T22:36:58Z) - Localizing and Mitigating Errors in Long-form Question Answering [79.63372684264921]
Long-form question answering (LFQA) aims to provide thorough and in-depth answers to complex questions, enhancing comprehension.
This work introduces HaluQuestQA, the first hallucination dataset with localized error annotations for human-written and model-generated LFQA answers.
arXiv Detail & Related papers (2024-07-16T17:23:16Z) - Uncertainty Estimation of Large Language Models in Medical Question Answering [60.72223137560633]
Large Language Models (LLMs) show promise for natural language generation in healthcare, but risk hallucinating factually incorrect information.
We benchmark popular uncertainty estimation (UE) methods with different model sizes on medical question-answering datasets.
Our results show that current approaches generally perform poorly in this domain, highlighting the challenge of UE for medical applications.
arXiv Detail & Related papers (2024-07-11T16:51:33Z) - On Large Language Models' Hallucination with Regard to Known Facts [74.96789694959894]
Large language models are successful in answering factoid questions but are also prone to hallucination.
We investigate the phenomenon of LLMs possessing correct answer knowledge yet still hallucinating from the perspective of inference dynamics.
Our study shed light on understanding the reasons for LLMs' hallucinations on their known facts, and more importantly, on accurately predicting when they are hallucinating.
arXiv Detail & Related papers (2024-03-29T06:48:30Z) - Don't Just Say "I don't know"! Self-aligning Large Language Models for Responding to Unknown Questions with Explanations [70.6395572287422]
Self-alignment method is capable of not only refusing to answer but also providing explanation to the unanswerability of unknown questions.
We conduct disparity-driven self-curation to select qualified data for fine-tuning the LLM itself for aligning the responses to unknown questions as desired.
arXiv Detail & Related papers (2024-02-23T02:24:36Z) - Ever: Mitigating Hallucination in Large Language Models through
Real-Time Verification and Rectification [18.59695929601458]
We introduce a novel approach called Real-time Verification and Rectification (Ever)
Ever employs a real-time, step-wise generation and hallucination rectification strategy.
Ever demonstrates a significant improvement in generating trustworthy and factually accurate text across a diverse range of tasks.
arXiv Detail & Related papers (2023-11-15T17:04:56Z) - Weakly Supervised Visual Question Answer Generation [2.7605547688813172]
We present a weakly supervised method that synthetically generates question-answer pairs procedurally from visual information and captions.
We perform an exhaustive experimental analysis on VQA dataset and see that our model significantly outperforms SOTA methods on BLEU scores.
arXiv Detail & Related papers (2023-06-11T08:46:42Z) - CLAM: Selective Clarification for Ambiguous Questions with Large
Language Models [37.37606905433334]
We show that current SotA models do not ask the user for clarification when presented with imprecise questions.
We introduce CLAM, a framework that first uses the model to detect ambiguous questions and if an ambiguous question is detected, prompts the model to ask the user for clarification.
We show that our method achieves a 20.15 percentage point accuracy improvement over SotA on a novel ambiguous question-answering answering data set.
arXiv Detail & Related papers (2022-12-15T12:47:18Z) - Read before Generate! Faithful Long Form Question Answering with Machine
Reading [77.17898499652306]
Long-form question answering (LFQA) aims to generate a paragraph-length answer for a given question.
We propose a new end-to-end framework that jointly models answer generation and machine reading.
arXiv Detail & Related papers (2022-03-01T10:41:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.