Related papers: A Survey of Multimodal Ophthalmic Diagnostics: From Task-Specific Approaches to Foundational Models

A Survey of Multimodal Ophthalmic Diagnostics: From Task-Specific Approaches to Foundational Models

URL: http://arxiv.org/abs/2508.03734v1
Date: Thu, 31 Jul 2025 10:49:21 GMT
Title: A Survey of Multimodal Ophthalmic Diagnostics: From Task-Specific Approaches to Foundational Models
Authors: Xiaoling Luo, Ruli Zheng, Qiaojian Zheng, Zibo Du, Shuo Yang, Meidan Ding, Qihao Xu, Chengliang Liu, Linlin Shen,
Abstract summary: The review focuses on two main categories: task-specific multimodal approaches and large-scale multimodal foundation models.<n>The survey critically examines important datasets, evaluation metrics, and methodological innovations.<n>It also discusses ongoing challenges such as variability in data, limited annotations, lack of interpretability, and issues with generalizability across different patient populations.
Score: 28.34025112894094
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visual impairment represents a major global health challenge, with multimodal imaging providing complementary information that is essential for accurate ophthalmic diagnosis. This comprehensive survey systematically reviews the latest advances in multimodal deep learning methods in ophthalmology up to the year 2025. The review focuses on two main categories: task-specific multimodal approaches and large-scale multimodal foundation models. Task-specific approaches are designed for particular clinical applications such as lesion detection, disease diagnosis, and image synthesis. These methods utilize a variety of imaging modalities including color fundus photography, optical coherence tomography, and angiography. On the other hand, foundation models combine sophisticated vision-language architectures and large language models pretrained on diverse ophthalmic datasets. These models enable robust cross-modal understanding, automated clinical report generation, and decision support. The survey critically examines important datasets, evaluation metrics, and methodological innovations including self-supervised learning, attention-based fusion, and contrastive alignment. It also discusses ongoing challenges such as variability in data, limited annotations, lack of interpretability, and issues with generalizability across different patient populations. Finally, the survey outlines promising future directions that emphasize the use of ultra-widefield imaging and reinforcement learning-based reasoning frameworks to create intelligent, interpretable, and clinically applicable AI systems for ophthalmology.

Related papers

From Pixels to Polygons: A Survey of Deep Learning Approaches for Medical Image-to-Mesh Reconstruction [38.67693323186832]
This survey systematically categorizes existing approaches into four main categories: template models, statistical models, generative models, and implicit models.<n>We provide an extensive evaluation of these methods across various anatomical applications, from cardiac imaging to neurological studies.<n>The survey identifies current challenges in the field, including requirements for topological correctness, geometric accuracy, and multi-modality integration.
arXiv Detail & Related papers (2025-05-06T15:01:43Z)
EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model [51.66031028717933]
Medical Large Vision-Language Models (Med-LVLMs) demonstrate significant potential in healthcare.<n>Currently, intelligent ophthalmic diagnosis faces three major challenges: (i) Data; (ii) Benchmark; and (iii) Model.<n>We propose the Eyecare Kit, which tackles the aforementioned three key challenges with the tailored dataset, benchmark and model.
arXiv Detail & Related papers (2025-04-18T12:09:15Z)
A Survey of Medical Vision-and-Language Applications and Their Techniques [48.268198631277315]
Medical vision-and-language models (MVLMs) have attracted substantial interest due to their capability to offer a natural language interface for interpreting complex medical data. Here, we provide a comprehensive overview of MVLMs and the various medical tasks to which they have been applied. We also examine the datasets used for these tasks and compare the performance of different models based on standardized evaluation metrics.
arXiv Detail & Related papers (2024-11-19T03:27:05Z)
ViKL: A Mammography Interpretation Framework via Multimodal Aggregation of Visual-knowledge-linguistic Features [54.37042005469384]
We announce MVKL, the first multimodal mammography dataset encompassing multi-view images, detailed manifestations and reports. Based on this dataset, we focus on the challanging task of unsupervised pretraining. We propose ViKL, a framework that synergizes Visual, Knowledge, and Linguistic features.
arXiv Detail & Related papers (2024-09-24T05:01:23Z)
EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis [20.318178211934985]
We propose EyeCLIP, a visual-language foundation model developed using over 2.77 million ophthalmology images with partial text data. EyeCLIP can be transferred to a wide range of downstream tasks involving ocular and systemic diseases.
arXiv Detail & Related papers (2024-09-10T17:00:19Z)
QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge [93.61262892578067]
Uncertainty in medical image segmentation tasks, especially inter-rater variability, presents a significant challenge. This variability directly impacts the development and evaluation of automated segmentation algorithms. We report the set-up and summarize the benchmark results of the Quantification of Uncertainties in Biomedical Image Quantification Challenge (QUBIQ)
arXiv Detail & Related papers (2024-03-19T17:57:24Z)
Deep Learning and Computer Vision for Glaucoma Detection: A Review [0.8379286663107844]
Glaucoma is the leading cause of irreversible blindness worldwide. Recent advances in computer vision and deep learning have demonstrated the potential for automated assessment. We survey recent studies on AI-based glaucoma diagnosis using fundus, optical coherence tomography, and visual field images.
arXiv Detail & Related papers (2023-07-31T09:49:51Z)
MedFMC: A Real-world Dataset and Benchmark For Foundation Model Adaptation in Medical Image Classification [41.16626194300303]
Foundation models, often pre-trained with large-scale data, have achieved paramount success in jump-starting various vision and language applications. Recent advances further enable adapting foundation models in downstream tasks efficiently using only a few training samples. Yet, the application of such learning paradigms in medical image analysis remains scarce due to the shortage of publicly accessible data and benchmarks.
arXiv Detail & Related papers (2023-06-16T01:46:07Z)
Pixel-Level Explanation of Multiple Instance Learning Models in Biomedical Single Cell Images [52.527733226555206]
We investigate the use of four attribution methods to explain a multiple instance learning models. We study two datasets of acute myeloid leukemia with over 100 000 single cell images. We compare attribution maps with the annotations of a medical expert to see how the model's decision-making differs from the human standard.
arXiv Detail & Related papers (2023-03-15T14:00:11Z)
Recent advances and clinical applications of deep learning in medical image analysis [7.132678647070632]
We reviewed and summarized more than 200 recently published papers to provide a comprehensive overview of applying deep learning methods in various medical image analysis tasks. Especially, we emphasize the latest progress and contributions of state-of-the-art unsupervised and semi-supervised deep learning in medical images.
arXiv Detail & Related papers (2021-05-27T18:05:12Z)
Learning Binary Semantic Embedding for Histology Image Classification and Retrieval [56.34863511025423]
We propose a novel method for Learning Binary Semantic Embedding (LBSE) Based on the efficient and effective embedding, classification and retrieval are performed to provide interpretable computer-assisted diagnosis for histology images. Experiments conducted on three benchmark datasets validate the superiority of LBSE under various scenarios.
arXiv Detail & Related papers (2020-10-07T08:36:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.