Hallucination Benchmark in Medical Visual Question Answering
- URL: http://arxiv.org/abs/2401.05827v2
- Date: Wed, 3 Apr 2024 12:42:32 GMT
- Title: Hallucination Benchmark in Medical Visual Question Answering
- Authors: Jinge Wu, Yunsoo Kim, Honghan Wu,
- Abstract summary: We created a hallucination benchmark of medical images paired with question-answer sets and conducted a comprehensive evaluation of the state-of-the-art models.
The study provides an in-depth analysis of current models' limitations and reveals the effectiveness of various prompting strategies.
- Score: 2.4302611783073145
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The recent success of large language and vision models (LLVMs) on vision question answering (VQA), particularly their applications in medicine (Med-VQA), has shown a great potential of realizing effective visual assistants for healthcare. However, these models are not extensively tested on the hallucination phenomenon in clinical settings. Here, we created a hallucination benchmark of medical images paired with question-answer sets and conducted a comprehensive evaluation of the state-of-the-art models. The study provides an in-depth analysis of current models' limitations and reveals the effectiveness of various prompting strategies.
Related papers
- H-POPE: Hierarchical Polling-based Probing Evaluation of Hallucinations in Large Vision-Language Models [0.0]
We propose H-POPE, a coarse-to-fine-grained benchmark that assesses hallucinations in object existence and attributes.
Our evaluation shows that models are prone to hallucinations on object existence, and even more so on fine-grained attributes.
arXiv Detail & Related papers (2024-11-06T17:55:37Z) - Explore the Hallucination on Low-level Perception for MLLMs [83.12180878559295]
We aim to define and evaluate the self-awareness of MLLMs in low-level visual perception and understanding tasks.
We present QL-Bench, a benchmark settings to simulate human responses to low-level vision.
We demonstrate that while some models exhibit robust low-level visual capabilities, their self-awareness remains relatively underdeveloped.
arXiv Detail & Related papers (2024-09-15T14:38:29Z) - MedVH: Towards Systematic Evaluation of Hallucination for Large Vision Language Models in the Medical Context [21.562034852024272]
Large Vision Language Models (LVLMs) have recently achieved superior performance in various tasks on natural image and text data.
Despite their advancements, there has been scant research on the robustness of these models against hallucination when fine-tuned on smaller datasets.
We introduce a new benchmark dataset, the Medical Visual Hallucination Test (MedVH), to evaluate the hallucination of domain-specific LVLMs.
arXiv Detail & Related papers (2024-07-03T00:59:03Z) - Detecting and Evaluating Medical Hallucinations in Large Vision Language Models [22.30139330566514]
Large Vision Language Models (LVLMs) are increasingly integral to healthcare applications.
LVLMs inherit susceptibility to hallucinations-a significant concern in high-stakes medical contexts.
We introduce Med-HallMark, the first benchmark specifically designed for hallucination detection and evaluation.
We also present MediHallDetector, a novel Medical LVLM engineered for precise hallucination detection.
arXiv Detail & Related papers (2024-06-14T17:14:22Z) - Quantity Matters: Towards Assessing and Mitigating Number Hallucination in Large Vision-Language Models [57.42800112251644]
We focus on a specific type of hallucination-number hallucination, referring to models incorrectly identifying the number of certain objects in pictures.
We devise a training approach aimed at improving consistency to reduce number hallucinations, which leads to an 8% enhancement in performance over direct finetuning methods.
arXiv Detail & Related papers (2024-03-03T02:31:11Z) - On Large Visual Language Models for Medical Imaging Analysis: An
Empirical Study [13.972931873011914]
Large language models (LLMs) have taken the spotlight in natural language processing.
Visual language models (VLMs), such as LLaVA, Flamingo, or CLIP, have demonstrated impressive performance on various visio-linguistic tasks.
arXiv Detail & Related papers (2024-02-21T23:01:38Z) - OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM [48.16696073640864]
We introduce OmniMedVQA, a novel comprehensive medical Visual Question Answering (VQA) benchmark.
All images in this benchmark are sourced from authentic medical scenarios.
We have found that existing LVLMs struggle to address these medical VQA problems effectively.
arXiv Detail & Related papers (2024-02-14T13:51:56Z) - Towards Mitigating Hallucination in Large Language Models via
Self-Reflection [63.2543947174318]
Large language models (LLMs) have shown promise for generative and knowledge-intensive tasks including question-answering (QA) tasks.
This paper analyses the phenomenon of hallucination in medical generative QA systems using widely adopted LLMs and datasets.
arXiv Detail & Related papers (2023-10-10T03:05:44Z) - Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models [67.8024390595066]
NOPE (Negative Object Presence Evaluation) is a novel benchmark designed to assess object hallucination in vision-language (VL) models.
We extensively investigate the performance of 10 state-of-the-art VL models in discerning the non-existence of objects in visual questions.
arXiv Detail & Related papers (2023-10-09T01:52:27Z) - Artificial General Intelligence for Medical Imaging Analysis [92.3940918983821]
Large-scale Artificial General Intelligence (AGI) models have achieved unprecedented success in a variety of general domain tasks.
These models face notable challenges arising from the medical field's inherent complexities and unique characteristics.
This review aims to offer insights into the future implications of AGI in medical imaging, healthcare, and beyond.
arXiv Detail & Related papers (2023-06-08T18:04:13Z) - A Question-Centric Model for Visual Question Answering in Medical
Imaging [3.619444603816032]
We present a novel Visual Question Answering approach that allows an image to be queried by means of a written question.
Experiments on a variety of medical and natural image datasets show that by fusing image and question features in a novel way, the proposed approach achieves an equal or higher accuracy compared to current methods.
arXiv Detail & Related papers (2020-03-02T10:16:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.