Related papers: MedVH: Towards Systematic Evaluation of Hallucination for Large Vision Language Models in the Medical Context

MedVH: Towards Systematic Evaluation of Hallucination for Large Vision Language Models in the Medical Context

URL: http://arxiv.org/abs/2407.02730v1
Date: Wed, 3 Jul 2024 00:59:03 GMT
Title: MedVH: Towards Systematic Evaluation of Hallucination for Large Vision Language Models in the Medical Context
Authors: Zishan Gu, Changchang Yin, Fenglin Liu, Ping Zhang,
Abstract summary: Large Vision Language Models (LVLMs) have recently achieved superior performance in various tasks on natural image and text data. Despite their advancements, there has been scant research on the robustness of these models against hallucination when fine-tuned on smaller datasets. We introduce a new benchmark dataset, the Medical Visual Hallucination Test (MedVH), to evaluate the hallucination of domain-specific LVLMs.
Score: 21.562034852024272
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Vision Language Models (LVLMs) have recently achieved superior performance in various tasks on natural image and text data, which inspires a large amount of studies for LVLMs fine-tuning and training. Despite their advancements, there has been scant research on the robustness of these models against hallucination when fine-tuned on smaller datasets. In this study, we introduce a new benchmark dataset, the Medical Visual Hallucination Test (MedVH), to evaluate the hallucination of domain-specific LVLMs. MedVH comprises five tasks to evaluate hallucinations in LVLMs within the medical context, which includes tasks for comprehensive understanding of textual and visual input, as well as long textual response generation. Our extensive experiments with both general and medical LVLMs reveal that, although medical LVLMs demonstrate promising performance on standard medical tasks, they are particularly susceptible to hallucinations, often more so than the general models, raising significant concerns about the reliability of these domain-specific models. For medical LVLMs to be truly valuable in real-world applications, they must not only accurately integrate medical knowledge but also maintain robust reasoning abilities to prevent hallucination. Our work paves the way for future evaluations of these studies.

Related papers

MedHEval: Benchmarking Hallucinations and Mitigation Strategies in Medical Large Vision-Language Models [37.78272983522441]
Large Vision Language Models (LVLMs) are becoming increasingly important in the medical domain. MedHEval is a novel benchmark that systematically evaluates hallucinations and mitigation strategies in Med-LVLMs. We conduct experiments across 11 popular (Med)-LVLMs and evaluate 7 state-of-the-art hallucination mitigation techniques.
arXiv Detail & Related papers (2025-03-04T00:40:09Z)
MedHallTune: An Instruction-Tuning Benchmark for Mitigating Medical Hallucination in Vision-Language Models [81.64135119165277]
hallucinations can jeopardize clinical decision making, potentially harming the diagnosis and treatments. We propose MedHallTune, a large-scale benchmark designed specifically to evaluate and mitigate hallucinations in medical VLMs. We conduct a comprehensive evaluation of current medical and general VLMs using MedHallTune, assessing their performance across key metrics, including clinical accuracy, relevance, detail level, and risk level.
arXiv Detail & Related papers (2025-02-28T06:59:49Z)
Towards a Systematic Evaluation of Hallucinations in Large-Vision Language Models [57.58426038241812]
Large Vision-Language Models (LVLMs) have demonstrated remarkable performance in complex multimodal tasks. These models still suffer from hallucinations when required to implicitly recognize or infer diverse visual entities from images. We propose a novel visual question answering (VQA) benchmark that employs contextual reasoning prompts as hallucination attacks.
arXiv Detail & Related papers (2024-12-29T23:56:01Z)
A Survey of Hallucination in Large Visual Language Models [48.794850395309076]
The existence of hallucinations has limited the potential and practical effectiveness of LVLM in various fields. The structure of LVLMs and main causes of hallucination generation are introduced. The available hallucination evaluation benchmarks for LVLMs are presented.
arXiv Detail & Related papers (2024-10-20T10:58:58Z)
Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models? [53.89380284760555]
Large vision-language models (LVLMs) produce captions that mention concepts that cannot be found in the image. These hallucinations erode the trustworthiness of LVLMs and are arguably among the main obstacles to their ubiquitous adoption. Recent work suggests that addition of grounding objectives -- those that explicitly align image regions or objects to text spans -- reduces the amount of LVLM hallucination.
arXiv Detail & Related papers (2024-06-20T16:56:11Z)
Detecting and Evaluating Medical Hallucinations in Large Vision Language Models [22.30139330566514]
Large Vision Language Models (LVLMs) are increasingly integral to healthcare applications. LVLMs inherit susceptibility to hallucinations-a significant concern in high-stakes medical contexts. We introduce Med-HallMark, the first benchmark specifically designed for hallucination detection and evaluation. We also present MediHallDetector, a novel Medical LVLM engineered for precise hallucination detection.
arXiv Detail & Related papers (2024-06-14T17:14:22Z)
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM [48.16696073640864]
We introduce OmniMedVQA, a novel comprehensive medical Visual Question Answering (VQA) benchmark. All images in this benchmark are sourced from authentic medical scenarios. We have found that existing LVLMs struggle to address these medical VQA problems effectively.
arXiv Detail & Related papers (2024-02-14T13:51:56Z)
Hallucination Benchmark in Medical Visual Question Answering [2.4302611783073145]
We created a hallucination benchmark of medical images paired with question-answer sets and conducted a comprehensive evaluation of the state-of-the-art models. The study provides an in-depth analysis of current models' limitations and reveals the effectiveness of various prompting strategies.
arXiv Detail & Related papers (2024-01-11T10:52:17Z)
Analyzing and Mitigating Object Hallucination in Large Vision-Language Models [110.12460299261531]
Large vision-language models (LVLMs) have shown remarkable abilities in understanding visual information with human languages. LVLMs still suffer from object hallucination, which is the problem of generating descriptions that include objects that do not actually exist in the images. We propose a powerful algorithm, LVLM Hallucination Revisor (LURE), to rectify object hallucination in LVLMs by reconstructing less hallucinatory descriptions.
arXiv Detail & Related papers (2023-10-01T18:10:53Z)
Evaluation and Analysis of Hallucination in Large Vision-Language Models [49.19829480199372]
Large Vision-Language Models (LVLMs) have recently achieved remarkable success. LVLMs are still plagued by the hallucination problem. Hallucination refers to the information of LVLMs' responses that does not exist in the visual input.
arXiv Detail & Related papers (2023-08-29T08:51:24Z)
Med-HALT: Medical Domain Hallucination Test for Large Language Models [0.0]
This research paper focuses on the challenges posed by hallucinations in large language models (LLMs) We propose a new benchmark and dataset, Med-HALT (Medical Domain Hallucination Test), designed specifically to evaluate and reduce hallucinations.
arXiv Detail & Related papers (2023-07-28T06:43:04Z)
Evaluating Object Hallucination in Large Vision-Language Models [122.40337582958453]
This work presents the first systematic study on object hallucination of large vision-language models (LVLMs) We find that LVLMs tend to generate objects that are inconsistent with the target images in the descriptions. We propose a polling-based query method called POPE to evaluate the object hallucination.
arXiv Detail & Related papers (2023-05-17T16:34:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.