Related papers: FusionFM: Fusing Eye-specific Foundational Models for Optimized Ophthalmic Diagnosis

FusionFM: Fusing Eye-specific Foundational Models for Optimized Ophthalmic Diagnosis

URL: http://arxiv.org/abs/2508.11721v1
Date: Fri, 15 Aug 2025 01:17:52 GMT
Title: FusionFM: Fusing Eye-specific Foundational Models for Optimized Ophthalmic Diagnosis
Authors: Ke Zou, Jocelyn Hui Lin Goh, Yukun Zhou, Tian Lin, Samantha Min Er Yew, Sahana Srinivasan, Meng Wang, Rui Santos, Gabor M. Somfai, Huazhu Fu, Haoyu Chen, Pearse A. Keane, Ching-Yu Cheng, Yih Chung Tham,
Abstract summary: Foundation models (FMs) have shown great promise in medical image analysis by improving generalization across diverse downstream tasks.<n>To our knowledge, this is the first study to systematically evaluate both single and fused ophthalmic FMs.<n>We benchmarked four state-of-the-art FMs using standardized datasets from multiple countries and evaluated their performance using AUC and F1 metrics.
Score: 36.79693801937608
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Foundation models (FMs) have shown great promise in medical image analysis by improving generalization across diverse downstream tasks. In ophthalmology, several FMs have recently emerged, but there is still no clear answer to fundamental questions: Which FM performs the best? Are they equally good across different tasks? What if we combine all FMs together? To our knowledge, this is the first study to systematically evaluate both single and fused ophthalmic FMs. To address these questions, we propose FusionFM, a comprehensive evaluation suite, along with two fusion approaches to integrate different ophthalmic FMs. Our framework covers both ophthalmic disease detection (glaucoma, diabetic retinopathy, and age-related macular degeneration) and systemic disease prediction (diabetes and hypertension) based on retinal imaging. We benchmarked four state-of-the-art FMs (RETFound, VisionFM, RetiZero, and DINORET) using standardized datasets from multiple countries and evaluated their performance using AUC and F1 metrics. Our results show that DINORET and RetiZero achieve superior performance in both ophthalmic and systemic disease tasks, with RetiZero exhibiting stronger generalization on external datasets. Regarding fusion strategies, the Gating-based approach provides modest improvements in predicting glaucoma, AMD, and hypertension. Despite these advances, predicting systemic diseases, especially hypertension in external cohort remains challenging. These findings provide an evidence-based evaluation of ophthalmic FMs, highlight the benefits of model fusion, and point to strategies for enhancing their clinical applicability.

Related papers

Information-driven Fusion of Pathology Foundation Models for Enhanced Disease Characterization [0.26249027950824505]
Foundation models (FMs) have demonstrated strong performance across diverse pathology tasks.<n>We propose an information-driven, intelligent fusion strategy for integrating multiple FMs into a unified representation.<n>Our findings suggest that intelligent, correlation-guided fusion of pathology FMs can yield compact, task-tailored representations.
arXiv Detail & Related papers (2025-12-11T20:38:03Z)
MedAlign: A Synergistic Framework of Multimodal Preference Optimization and Federated Meta-Cognitive Reasoning [52.064286116035134]
We develop MedAlign, a framework to ensure visually accurate LVLM responses for Medical Visual Question Answering (Med-VQA)<n>We first propose a multimodal Direct Preference Optimization (mDPO) objective to align preference learning with visual context.<n>We then design a Retrieval-Aware Mixture-of-Experts (RA-MoE) architecture that utilizes image and text similarity to route queries to a specialized and context-augmented LVLM.
arXiv Detail & Related papers (2025-10-24T02:11:05Z)
Evaluating Fundus-Specific Foundation Models for Diabetic Macular Edema Detection [0.19514194744184568]
Diabetic Macular Edema (DME) is a leading cause of vision loss among patients with Diabetic Retinopathy (DR)<n>Deep learning has shown promising results for automatically detecting this condition from fundus images.<n>It is unclear if Foundation Models (FM) can cope with DME detection in particular.
arXiv Detail & Related papers (2025-10-08T17:41:02Z)
TemMed-Bench: Evaluating Temporal Medical Image Reasoning in Vision-Language Models [54.48710348910535]
Existing medical reasoning benchmarks primarily focus on analyzing a patient's condition based on an image from a single visit.<n>We introduce TemMed-Bench, the first benchmark designed for analyzing changes in patients' conditions between different clinical visits.
arXiv Detail & Related papers (2025-09-29T17:51:26Z)
From Promise to Practical Reality: Transforming Diffusion MRI Analysis with Fast Deep Learning Enhancement [55.64033992736822]
FastFOD-Net is an end-to-end deep learning framework enhancing FODs with superior performance and delivering training/inference efficiency for clinical use.<n>This work will facilitate the more widespread adoption of, and build clinical trust in, deep learning based methods for diffusion MRI enhancement.
arXiv Detail & Related papers (2025-08-13T17:56:29Z)
AdaFusion: Prompt-Guided Inference with Adaptive Fusion of Pathology Foundation Models [49.550545038402184]
We propose AdaFusion, a novel prompt-guided inference framework.<n>Our method compresses and aligns tile-level features from diverse models.<n>AdaFusion consistently surpasses individual PFMs across both classification and regression tasks.
arXiv Detail & Related papers (2025-08-07T07:09:31Z)
EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model [51.66031028717933]
Medical Large Vision-Language Models (Med-LVLMs) demonstrate significant potential in healthcare.<n>Currently, intelligent ophthalmic diagnosis faces three major challenges: (i) Data; (ii) Benchmark; and (iii) Model.<n>We propose the Eyecare Kit, which tackles the aforementioned three key challenges with the tailored dataset, benchmark and model.
arXiv Detail & Related papers (2025-04-18T12:09:15Z)
Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models [6.176432104264649]
Vision-language models (VLMs) have achieved impressive progress in natural image reasoning, yet their potential in medical imaging remains underexplored.<n>We propose Med-R1, a reinforcement learning (RL)-enhanced vision-language model designed to improve generalization and reliability in medical reasoning.<n>We evaluate Med-R1 across eight distinct medical imaging modalities.
arXiv Detail & Related papers (2025-03-18T06:12:38Z)
Is an Ultra Large Natural Image-Based Foundation Model Superior to a Retina-Specific Model for Detecting Ocular and Systemic Diseases? [15.146396276161937]
RETFound and DINOv2 models were evaluated for ocular disease detection and systemic disease prediction tasks.<n> RETFound achieved superior performance over all DINOv2 models in predicting heart failure, infarction, and ischaemic stroke.
arXiv Detail & Related papers (2025-02-10T09:31:39Z)
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models [49.765466293296186]
Recent progress in Medical Large Vision-Language Models (Med-LVLMs) has opened up new possibilities for interactive diagnostic tools.<n>Med-LVLMs often suffer from factual hallucination, which can lead to incorrect diagnoses.<n>We propose a versatile multimodal RAG system, MMed-RAG, designed to enhance the factuality of Med-LVLMs.
arXiv Detail & Related papers (2024-10-16T23:03:27Z)
ELF: An End-to-end Local and Global Multimodal Fusion Framework for Glaucoma Grading [43.12236694270165]
We propose an end-to-end local and global multi-modal fusion framework for glaucoma grading named ELF. ELF can fully utilize the complementary information between fundus and OCT. The extensive experiment conducted on the multi-modal glaucoma grading GAMMA dataset can prove the effiectness of ELF.
arXiv Detail & Related papers (2023-11-14T09:51:00Z)
VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence [27.92420837559191]
We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications. The generalist intelligence of VisionFM outperformed ophthalmologists with basic and intermediate levels in jointly diagnosing 12 common ophthalmic diseases.
arXiv Detail & Related papers (2023-10-08T03:40:14Z)
AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation [50.21065317817769]
We propose an AlignTransformer framework, which includes the Align Hierarchical Attention (AHA) and the Multi-Grained Transformer (MGT) modules. Experiments on the public IU-Xray and MIMIC-CXR datasets show that the AlignTransformer can achieve results competitive with state-of-the-art methods on the two datasets.
arXiv Detail & Related papers (2022-03-18T13:43:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.