Related papers: Pelican: Correcting Hallucination in Vision-LLMs via Claim Decomposition and Program of Thought Verification

Pelican: Correcting Hallucination in Vision-LLMs via Claim Decomposition and Program of Thought Verification

URL: http://arxiv.org/abs/2407.02352v2
Date: Tue, 29 Oct 2024 01:09:44 GMT
Title: Pelican: Correcting Hallucination in Vision-LLMs via Claim Decomposition and Program of Thought Verification
Authors: Pritish Sahu, Karan Sikka, Ajay Divakaran,
Abstract summary: Pelican is a framework designed to detect and mitigate hallucinations through claim verification. Our experiments reveal a drop in hallucination rate by 8% - 32% across various baseline LVLMs and a 27% drop compared to approaches proposed for hallucination mitigation on MMHal-Bench.
Score: 13.081342795985003
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large Visual Language Models (LVLMs) struggle with hallucinations in visual instruction following task(s), limiting their trustworthiness and real-world applicability. We propose Pelican -- a novel framework designed to detect and mitigate hallucinations through claim verification. Pelican first decomposes the visual claim into a chain of sub-claims based on first-order predicates. These sub-claims consist of (predicate, question) pairs and can be conceptualized as nodes of a computational graph. We then use Program-of-Thought prompting to generate Python code for answering these questions through flexible composition of external tools. Pelican improves over prior work by introducing (1) intermediate variables for precise grounding of object instances, and (2) shared computation for answering the sub-question to enable adaptive corrections and inconsistency identification. We finally use reasoning abilities of LLMs to verify the correctness of the claim by considering the consistency and confidence of the (question, answer) pairs from each sub-claim. Our experiments reveal a drop in hallucination rate by ~ 8% - 32% across various baseline LVLMs and a 27% drop compared to approaches proposed for hallucination mitigation on MMHal-Bench. Results on two other benchmarks further corroborate our results.

Related papers

Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception [28.351994916635423]
We discuss the vulnerability of LVLMs in solving counterfactual presupposition questions (CPQs) We introduce "Antidote", a unified, synthetic data-driven post-training framework for mitigating both types of hallucination. We construct "CP-Bench", a novel benchmark to evaluate LVLMs' ability to correctly handle CPQs and produce factual responses.
arXiv Detail & Related papers (2025-04-29T07:05:24Z)
HalluLens: LLM Hallucination Benchmark [49.170128733508335]
Large language models (LLMs) often generate responses that deviate from user input or training data, a phenomenon known as "hallucination" This paper introduces a comprehensive hallucination benchmark, incorporating both new extrinsic and existing intrinsic evaluation tasks.
arXiv Detail & Related papers (2025-04-24T13:40:27Z)
Unveiling the Magic of Code Reasoning through Hypothesis Decomposition and Amendment [54.62926010621013]
We introduce a novel task, code reasoning, to provide a new perspective for the reasoning abilities of large language models. We summarize three meta-benchmarks based on established forms of logical reasoning, and instantiate these into eight specific benchmark tasks. We present a new pathway exploration pipeline inspired by human intricate problem-solving methods.
arXiv Detail & Related papers (2025-02-17T10:39:58Z)
GRAIT: Gradient-Driven Refusal-Aware Instruction Tuning for Effective Hallucination Mitigation [62.63014905981601]
Refusal-Aware Instruction Tuning (RAIT) aims to enhance Large Language Models (LLMs) by improving their ability to refuse responses to questions beyond their knowledge. Effective RAIT must address two key challenges: firstly, effectively reject unknown questions to minimize hallucinations; secondly, avoid over-refusal to ensure questions that can be correctly answered are not rejected. GraIT employs gradient-driven sample selection to effectively minimize hallucinations and (2) introduces an adaptive weighting mechanism during fine-tuning to reduce the risk of over-refusal.
arXiv Detail & Related papers (2025-02-09T14:11:30Z)
Aligning Large Language Models for Faithful Integrity Against Opposing Argument [71.33552795870544]
Large Language Models (LLMs) have demonstrated impressive capabilities in complex reasoning tasks. They can be easily misled by unfaithful arguments during conversations, even when their original statements are correct. We propose a novel framework, named Alignment for Faithful Integrity with Confidence Estimation.
arXiv Detail & Related papers (2025-01-02T16:38:21Z)
Dehallucinating Parallel Context Extension for Retrieval-Augmented Generation [42.76770979205655]
Large language models (LLMs) are susceptible to generating hallucinated information, despite the integration of retrieval-augmented generation (RAG) In this paper, we propose DePaC, which alleviates the hallucination problem with context-aware negative training and information-calibrated aggregation.
arXiv Detail & Related papers (2024-12-19T14:37:11Z)
DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations [14.025772159366184]
Large Language Models (LLMs) often hallucinate, producing unfaithful or factually incorrect outputs. Recent studies have identified specific attention heads within the Transformer architecture, known as retrieval heads. We propose Decoding by Contrasting Retrieval Heads (DeCoRe), a novel training-free decoding strategy.
arXiv Detail & Related papers (2024-10-24T15:44:33Z)
Trust but Verify: Programmatic VLM Evaluation in the Wild [62.14071929143684]
Programmatic VLM Evaluation (PROVE) is a new benchmarking paradigm for evaluating VLM responses to open-ended queries. We benchmark the helpfulness-truthfulness trade-offs of a range ofVLMs on PROVE, finding that very few are in-fact able to achieve a good balance between the two.
arXiv Detail & Related papers (2024-10-17T01:19:18Z)
FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning [10.709365940160685]
Existing approaches primarily detect the presence of hallucinations but lack a nuanced understanding of their types and manifestations. We introduce a comprehensive taxonomy that categorizes the common hallucinations in mathematical reasoning task into six types. We then propose FG-PRM, an augmented model designed to detect and mitigate hallucinations in a fine-grained, step-level manner.
arXiv Detail & Related papers (2024-10-08T19:25:26Z)
Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models [36.119299938503936]
Large vision-language models (LVLMs) have shown promising performance on a variety of vision-language tasks. They remain susceptible to hallucinations, generating outputs misaligned with visual content or instructions. We propose reflective instruction tuning, which integrates rationale learning into visual instruction tuning.
arXiv Detail & Related papers (2024-07-16T06:32:45Z)
ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models [65.12177400764506]
Large language models (LLMs) exhibit hallucinations in long-form question-answering tasks across various domains and wide applications. Current hallucination detection and mitigation datasets are limited in domains and sizes. This paper introduces an iterative self-training framework that simultaneously and progressively scales up the hallucination annotation dataset.
arXiv Detail & Related papers (2024-07-05T17:56:38Z)
Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models [70.19081534515371]
Large Language Models (LLMs) have gained widespread adoption in various natural language processing tasks. They generate unfaithful or inconsistent content that deviates from the input source, leading to severe consequences. We propose a robust discriminator named RelD to effectively detect hallucination in LLMs' generated answers.
arXiv Detail & Related papers (2024-07-04T18:47:42Z)
Counterfactual Debating with Preset Stances for Hallucination Elimination of LLMs [45.38821594541265]
Large Language Models (LLMs) excel in various natural language processing tasks but struggle with hallucination issues. We propose a CounterFactual Multi-Agent Debate (CFMAD) framework to override LLMs' inherent biases for answer inspection.
arXiv Detail & Related papers (2024-06-17T13:21:23Z)
Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem [58.3723958800254]
Large language models (LLMs) are highly effective in various natural language processing (NLP) tasks. They are susceptible to producing unreliable conjectures in ambiguous contexts called hallucination. This paper presents a new method for evaluating LLM hallucination in Question Answering (QA) based on the unanswerable math word problem (MWP)
arXiv Detail & Related papers (2024-03-06T09:06:34Z)
Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models [52.957842999317506]
Object hallucination refers to the phenomenon that the LVLMs claim non-existent objects in the image. We propose a Logical Closed Loop-based framework for Object Hallucination Detection and Mitigation, namely LogicCheckGPT. As a plug-and-play method, it can be seamlessly applied to all existing LVLMs.
arXiv Detail & Related papers (2024-02-18T15:28:39Z)
Alleviating Hallucinations of Large Language Models through Induced Hallucinations [67.35512483340837]
Large language models (LLMs) have been observed to generate responses that include inaccurate or fabricated information. We propose a simple textitInduce-then-Contrast Decoding (ICD) strategy to alleviate hallucinations.
arXiv Detail & Related papers (2023-12-25T12:32:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.