Related papers: MedThink: Inducing Medical Large-scale Visual Language Models to Hallucinate Less by Thinking More

MedThink: Inducing Medical Large-scale Visual Language Models to Hallucinate Less by Thinking More

URL: http://arxiv.org/abs/2406.11451v2
Date: Tue, 18 Jun 2024 14:20:46 GMT
Title: MedThink: Inducing Medical Large-scale Visual Language Models to Hallucinate Less by Thinking More
Authors: Yue Jiang, Jiawei Chen, Dingkang Yang, Mingcheng Li, Shunli Wang, Tong Wu, Ke Li, Lihua Zhang,
Abstract summary: Large Vision Language Models (LVLMs) are applied to multimodal medical generative tasks. LVLMs suffer from significant model hallucination issues. In this paper, we introduce a method that mimics human cognitive processes to construct fine-grained instruction pairs.
Score: 20.59298361626719
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: When Large Vision Language Models (LVLMs) are applied to multimodal medical generative tasks, they suffer from significant model hallucination issues. This severely impairs the model's generative accuracy, making it challenging for LVLMs to be implemented in real-world medical scenarios to assist doctors in diagnosis. Enhancing the training data for downstream medical generative tasks is an effective way to address model hallucination. Moreover, the limited availability of training data in the medical field and privacy concerns greatly hinder the model's accuracy and generalization capabilities. In this paper, we introduce a method that mimics human cognitive processes to construct fine-grained instruction pairs and apply the concept of chain-of-thought (CoT) from inference scenarios to training scenarios, thereby proposing a method called MedThink. Our experiments on various LVLMs demonstrate that our novel data construction method tailored for the medical domain significantly improves the model's performance in medical image report generation tasks and substantially mitigates the hallucinations. All resources of this work will be released soon.

Related papers

REMEMBER: Retrieval-based Explainable Multimodal Evidence-guided Modeling for Brain Evaluation and Reasoning in Zero- and Few-shot Neurodegenerative Diagnosis [6.446611581074913]
We introduce REMEMBER -- Retrieval-based Explainable Multimodalively-guided Modeling for Brain Evaluation and Reasoning. REMEMBER is a new machine learning framework that facilitates zero- and few-shot Alzheimer's diagnosis using brain MRI scans. Experimental results demonstrate that REMEMBER achieves robust zero- and few-shot performance.
arXiv Detail & Related papers (2025-04-12T22:06:15Z)
The Multi-Round Diagnostic RAG Framework for Emulating Clinical Reasoning [10.483453944197407]
We construct DiagnosGraph, a knowledge graph covering both modern medicine and Traditional Chinese Medicine.<n>To bridge the gap between colloquial patient narratives and academic medical knowledge, DiagnosGraph also introduces $1,908$ medical record.<n>Experiments conducted on four medical benchmarks, with evaluations by human physicians, demonstrate that MRD-RAG enhances the diagnostic performance of LLMs.
arXiv Detail & Related papers (2025-04-10T13:17:51Z)
MedRAG: Enhancing Retrieval-augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare Copilot [47.77948063906033]
Retrieval-augmented generation (RAG) is a well-suited technique for retrieving privacy-sensitive Electronic Health Records. This paper proposes MedRAG, a RAG model enhanced by knowledge graph (KG)-elicited reasoning for the medical domain. Tests show MedRAG provides more specific diagnostic insights and outperforms state-of-the-art models in reducing misdiagnosis rates.
arXiv Detail & Related papers (2025-02-06T12:27:35Z)
MINDSETS: Multi-omics Integration with Neuroimaging for Dementia Subtyping and Effective Temporal Study [0.7751705157998379]
Alzheimer's disease (AD) and vascular dementia (VaD) are the two most prevalent dementia types. This paper presents an innovative multi-omics approach to accurately differentiate AD from VaD, achieving a diagnostic accuracy of 89.25%.
arXiv Detail & Related papers (2024-11-06T10:13:28Z)
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models [49.765466293296186]
Recent progress in Medical Large Vision-Language Models (Med-LVLMs) has opened up new possibilities for interactive diagnostic tools. Med-LVLMs often suffer from factual hallucination, which can lead to incorrect diagnoses. We propose a versatile multimodal RAG system, MMed-RAG, designed to enhance the factuality of Med-LVLMs.
arXiv Detail & Related papers (2024-10-16T23:03:27Z)
Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports [51.45762396192655]
Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecting the medical field. Notably, Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models have epitomized a paradigm shift in Artificial General Intelligence for computer vision. This study evaluated the performance of the Gemini, GPT-4, and 4 popular large models for an exhaustive evaluation across 14 medical imaging datasets.
arXiv Detail & Related papers (2024-07-08T09:08:42Z)
MiniGPT-Med: Large Language Model as a General Interface for Radiology Diagnosis [28.421857904824627]
MiniGPT-Med is a vision-language model derived from large-scale language models and tailored for medical applications. It is capable of performing tasks such as medical report generation, visual question answering (VQA), and disease identification within medical imagery. It achieves state-of-the-art performance on medical report generation, higher than the previous best model by 19% accuracy.
arXiv Detail & Related papers (2024-07-04T18:21:10Z)
Towards Knowledge-Infused Automated Disease Diagnosis Assistant [14.150224660741939]
We build a diagnosis assistant to assist doctors, which identifies diseases based on patient-doctor interaction. We propose a two-channel, discourse-aware disease diagnosis model (KI-DDI), where the first channel encodes patient-doctor communication. In the next stage, the conversation and knowledge graph embeddings are infused together and fed to a deep neural network for disease identification.
arXiv Detail & Related papers (2024-05-18T05:18:50Z)
Radiology Report Generation Using Transformers Conditioned with Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information. The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z)
Leveraging A Medical Knowledge Graph into Large Language Models for Diagnosis Prediction [7.5569033426158585]
We propose an innovative approach for augmenting the proficiency of Large Language Models (LLMs) in automated diagnosis generation. We derive the KG from the National Library of Medicine's Unified Medical Language System (UMLS), a robust repository of biomedical knowledge. Our approach offers an explainable diagnostic pathway, edging us closer to the realization of AI-augmented diagnostic decision support systems.
arXiv Detail & Related papers (2023-08-28T06:05:18Z)
PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation [7.508437260320598]
We propose diagnosis-driven prompts for medical report generation (PromptMRG) PromptMRG is based on encoder-decoder architecture with an extra disease classification branch. Cross-modal feature enhancement retrieves similar reports from the database to assist the diagnosis of a query image.
arXiv Detail & Related papers (2023-08-24T07:10:31Z)
Cross-Modal Causal Intervention for Medical Report Generation [109.83549148448469]
Medical report generation (MRG) is essential for computer-aided diagnosis and medication guidance. Due to the spurious correlations within image-text data induced by visual and linguistic biases, it is challenging to generate accurate reports reliably describing lesion areas. We propose a novel Visual-Linguistic Causal Intervention (VLCI) framework for MRG, which consists of a visual deconfounding module (VDM) and a linguistic deconfounding module (LDM)
arXiv Detail & Related papers (2023-03-16T07:23:55Z)
Inheritance-guided Hierarchical Assignment for Clinical Automatic Diagnosis [50.15205065710629]
Clinical diagnosis, which aims to assign diagnosis codes for a patient based on the clinical note, plays an essential role in clinical decision-making. We propose a novel framework to combine the inheritance-guided hierarchical assignment and co-occurrence graph propagation for clinical automatic diagnosis.
arXiv Detail & Related papers (2021-01-27T13:16:51Z)
Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation [150.52617238140868]
We propose low-resource medical dialogue generation to transfer the diagnostic experience from source diseases to target ones. We also develop a Graph-Evolving Meta-Learning framework that learns to evolve the commonsense graph for reasoning disease-symptom correlations in a new disease.
arXiv Detail & Related papers (2020-12-22T13:20:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.