Related papers: MedDr: Diagnosis-Guided Bootstrapping for Large-Scale Medical Vision-Language Learning

Related papers

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning [57.873833577058]
We build a multimodal dataset enriched with extensive medical knowledge.<n>We then introduce our medical-specialized MLLM: Lingshu.<n>Lingshu undergoes multi-stage training to embed medical expertise and enhance its task-solving capabilities.
arXiv Detail & Related papers (2025-06-08T08:47:30Z)
MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning [63.63542462400175]
We propose MMedAgent-RL, a reinforcement learning-based multi-agent framework that enables dynamic, optimized collaboration among medical agents.<n> Specifically, we train two GP agents based on Qwen2.5-VL via RL: the triage doctor learns to assign patients to appropriate specialties, while the attending physician integrates the judgments from multi-specialists.<n>Experiments on five medical VQA benchmarks demonstrate that MMedAgent-RL not only outperforms both open-source and proprietary Med-LVLMs, but also exhibits human-like reasoning patterns.
arXiv Detail & Related papers (2025-05-31T13:22:55Z)
DoctorRAG: Medical RAG Fusing Knowledge with Patient Analogy through Textual Gradients [4.062920795080979]
Existing medical RAG systems mainly leverage knowledge from medical knowledge bases.<n>We propose DoctorRAG, a RAG framework that emulates doctor-like reasoning by integrating both explicit clinical knowledge and implicit case-based experience.<n>Our approach generates more accurate, relevant, and comprehensive responses, taking a step towards more doctor-like medical reasoning systems.
arXiv Detail & Related papers (2025-05-26T05:56:23Z)
AOR: Anatomical Ontology-Guided Reasoning for Medical Large Multimodal Model in Chest X-Ray Interpretation [25.070570876478417]
Chest X-rays (CXRs) are the most frequently performed imaging examinations in clinical settings.<n>Recent advancements in Large Multimodal Models (LMMs) have enabled automated CXR interpretation, enhancing diagnostic accuracy and efficiency.<n>Current Medical LMMs (MLMMs) still face two major challenges: (1) Insufficient region-level understanding and interaction, and (2) Limited interpretability due to single-step reasoning.
arXiv Detail & Related papers (2025-05-05T17:57:07Z)
Zero-shot Domain Generalization of Foundational Models for 3D Medical Image Segmentation: An Experimental Study [15.3909625201792]
Foundations models (FMs) trained on diverse large-scale data hold promise for zero-shot generalization. In this study, we examine their ability towards domain generalization (DG) by conducting a comprehensive experimental study. Our findings reveal the potential of promptable FMs in bridging the domain gap via smart prompting techniques.
arXiv Detail & Related papers (2025-03-28T20:33:41Z)
MedRAG: Enhancing Retrieval-augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare Copilot [47.77948063906033]
Retrieval-augmented generation (RAG) is a well-suited technique for retrieving privacy-sensitive Electronic Health Records. This paper proposes MedRAG, a RAG model enhanced by knowledge graph (KG)-elicited reasoning for the medical domain. Tests show MedRAG provides more specific diagnostic insights and outperforms state-of-the-art models in reducing misdiagnosis rates.
arXiv Detail & Related papers (2025-02-06T12:27:35Z)
Glioma Multimodal MRI Analysis System for Tumor Layered Diagnosis via Multi-task Semi-supervised Learning [9.665261760136032]
Gliomas are the most common primary tumors of the central nervous system. In this study, we propose a Glioma Multimodal MRI Analysis System (GMMAS) that utilize a deep learning network for processing multiple events simultaneously.
arXiv Detail & Related papers (2025-01-29T16:50:04Z)
KG4Diagnosis: A Hierarchical Multi-Agent LLM Framework with Knowledge Graph Enhancement for Medical Diagnosis [6.001401133840334]
KG4Diagnosis is a novel hierarchical multi-agent framework that combines Large Language Models with automated knowledge graph construction. Our framework mirrors real-world medical systems through a two-tier architecture: a general practitioner (GP) agent for initial assessment and triage, coordinating with specialized agents for in-depth diagnosis in specific domains.
arXiv Detail & Related papers (2024-12-22T02:40:59Z)
KA$^2$ER: Knowledge Adaptive Amalgamation of ExpeRts for Medical Images Segmentation [5.807887214293438]
We propose an adaptive amalgamation knowledge framework that aims to train a versatile foundation model to handle the joint goals of multiple expert models. In particular, we first train an nnUNet-based expert model for each task, and reuse the pre-trained SwinUNTER as the target foundation model. Within the hidden layer, the hierarchical attention mechanisms are designed to achieve adaptive merging of the target model to the hidden layer feature knowledge of all experts.
arXiv Detail & Related papers (2024-10-28T14:49:17Z)
LoRKD: Low-Rank Knowledge Decomposition for Medical Foundation Models [59.961172635689664]
"Knowledge Decomposition" aims to improve the performance on specific medical tasks. We propose a novel framework named Low-Rank Knowledge Decomposition (LoRKD) LoRKD explicitly separates gradients from different tasks by incorporating low-rank expert modules and efficient knowledge separation convolution.
arXiv Detail & Related papers (2024-09-29T03:56:21Z)
Do Vision Foundation Models Enhance Domain Generalization in Medical Image Segmentation? [10.20366295974822]
We introduce a novel decode head architecture, HQHSAM, which simply integrates elements from two state-of-the-art decoder heads, HSAM and HQSAM, to enhance segmentation performance. Our experiments on multiple datasets, encompassing various anatomies and modalities, reveal that FMs, particularly with the HQHSAM decode head, improve domain generalization for medical image segmentation.
arXiv Detail & Related papers (2024-09-12T11:41:35Z)
RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment [54.91736546490813]
We introduce the RuleAlign framework, designed to align Large Language Models with specific diagnostic rules. We develop a medical dialogue dataset comprising rule-based communications between patients and physicians. Experimental results demonstrate the effectiveness of the proposed approach.
arXiv Detail & Related papers (2024-08-22T17:44:40Z)
Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning [65.54680361074882]
Eye-gaze Guided Multi-modal Alignment (EGMA) framework harnesses eye-gaze data for better alignment of medical visual and textual features. We conduct downstream tasks of image classification and image-text retrieval on four medical datasets.
arXiv Detail & Related papers (2024-03-19T03:59:14Z)
A Foundation LAnguage-Image model of the Retina (FLAIR): Encoding expert knowledge in text supervision [17.583536041845402]
We present FLAIR, a pre-trained vision-language model for universal retinal fundus image understanding. We compiled 37 open-access, mostly categorical fundus imaging datasets from various sources. We integrate the expert's domain knowledge in the form of descriptive textual prompts, during both pre-training and zero-shot inference.
arXiv Detail & Related papers (2023-08-15T17:39:52Z)
Training Like a Medical Resident: Context-Prior Learning Toward Universal Medical Image Segmentation [38.61227663176952]
We propose a shift towards universal medical image segmentation, a paradigm aiming to build medical image understanding foundation models. We develop Hermes, a novel context-prior learning approach to address the challenges of data heterogeneity and annotation differences in medical image segmentation.
arXiv Detail & Related papers (2023-06-04T17:39:08Z)
Towards Medical Artificial General Intelligence via Knowledge-Enhanced Multimodal Pretraining [121.89793208683625]
Medical artificial general intelligence (MAGI) enables one foundation model to solve different medical tasks. We propose a new paradigm called Medical-knedge-enhanced mulTimOdal pretRaining (MOTOR)
arXiv Detail & Related papers (2023-04-26T01:26:19Z)
Factored Attention and Embedding for Unstructured-view Topic-related Ultrasound Report Generation [70.7778938191405]
We propose a novel factored attention and embedding model (termed FAE-Gen) for the unstructured-view topic-related ultrasound report generation. The proposed FAE-Gen mainly consists of two modules, i.e., view-guided factored attention and topic-oriented factored embedding, which capture the homogeneous and heterogeneous morphological characteristic across different views.
arXiv Detail & Related papers (2022-03-12T15:24:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.