Related papers: InSight: AI Mobile Screening Tool for Multiple Eye Disease Detection using Multimodal Fusion

InSight: AI Mobile Screening Tool for Multiple Eye Disease Detection using Multimodal Fusion

URL: http://arxiv.org/abs/2507.12669v1
Date: Wed, 16 Jul 2025 23:00:10 GMT
Title: InSight: AI Mobile Screening Tool for Multiple Eye Disease Detection using Multimodal Fusion
Authors: Ananya Raghu, Anisha Raghu, Alice S. Tang, Yannis M. Paulus, Tyson N. Kim, Tomiko T. Oskotsky,
Abstract summary: Age-related macular degeneration, glaucoma, diabetic retinopathy (DR), diabetic macular edema, and pathological myopia affect hundreds of millions of people worldwide.<n>We develop InSight, an AI-based app that combines patient metadata with fundus images for accurate diagnosis of five common eye diseases.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Background/Objectives: Age-related macular degeneration, glaucoma, diabetic retinopathy (DR), diabetic macular edema, and pathological myopia affect hundreds of millions of people worldwide. Early screening for these diseases is essential, yet access to medical care remains limited in low- and middle-income countries as well as in resource-limited settings. We develop InSight, an AI-based app that combines patient metadata with fundus images for accurate diagnosis of five common eye diseases to improve accessibility of screenings. Methods: InSight features a three-stage pipeline: real-time image quality assessment, disease diagnosis model, and a DR grading model to assess severity. Our disease diagnosis model incorporates three key innovations: (a) Multimodal fusion technique (MetaFusion) combining clinical metadata and images; (b) Pretraining method leveraging supervised and self-supervised loss functions; and (c) Multitask model to simultaneously predict 5 diseases. We make use of BRSET (lab-captured images) and mBRSET (smartphone-captured images) datasets, both of which also contain clinical metadata for model training/evaluation. Results: Trained on a dataset of BRSET and mBRSET images, the image quality checker achieves near-100% accuracy in filtering out low-quality fundus images. The multimodal pretrained disease diagnosis model outperforms models using only images by 6% in balanced accuracy for BRSET and 4% for mBRSET. Conclusions: The InSight pipeline demonstrates robustness across varied image conditions and has high diagnostic accuracy across all five diseases, generalizing to both smartphone and lab captured images. The multitask model contributes to the lightweight nature of the pipeline, making it five times computationally efficient compared to having five individual models corresponding to each disease.

Related papers

iMedImage Technical Report [5.0953390013898705]
Chromosome karyotype analysis is crucial for diagnosing hereditary diseases, yet detecting structural abnormalities remains challenging.<n>We developed iMedImage, an end-to-end model for general medical image recognition, demonstrating strong performance across multiple imaging tasks.
arXiv Detail & Related papers (2025-03-27T03:25:28Z)
Evaluation of Vision Transformers for Multimodal Image Classification: A Case Study on Brain, Lung, and Kidney Tumors [0.0]
The work evaluates the performance of Vision Transformers architectures, including Swin Transformer and MaxViT, in several datasets of MRI and CT scans.<n>The results revealed that the Swin Transformer provided high accuracy, achieving up to 99% on average for individual datasets and 99.4% accuracy for the combined dataset.
arXiv Detail & Related papers (2025-02-08T10:35:51Z)
Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis [55.959002385347645]
Latent Drifting enables diffusion models to be conditioned for medical images fitted for the complex task of counterfactual image generation.<n>We evaluate our method on three public longitudinal benchmark datasets of brain MRI and chest X-rays for counterfactual image generation.
arXiv Detail & Related papers (2024-12-30T01:59:34Z)
Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports [51.45762396192655]
Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecting the medical field. Notably, Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models have epitomized a paradigm shift in Artificial General Intelligence for computer vision. This study evaluated the performance of the Gemini, GPT-4, and 4 popular large models for an exhaustive evaluation across 14 medical imaging datasets.
arXiv Detail & Related papers (2024-07-08T09:08:42Z)
Lightening Anything in Medical Images [23.366303785451684]
We introduce a pioneering training-free Diffusion Model for Universal Medical Image Enhancement, named UniMIE. UniMIE demonstrates its unsupervised enhancement capabilities across various medical image modalities without the need for any fine-tuning. We conduct a comprehensive evaluation on 13 imaging modalities and over 15 medical types, demonstrating better qualities, robustness, and accuracy than other modality-specific and data-inefficient models.
arXiv Detail & Related papers (2024-06-01T05:07:50Z)
Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection. Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels. Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z)
Improved Automatic Diabetic Retinopathy Severity Classification Using Deep Multimodal Fusion of UWF-CFP and OCTA Images [1.6449510885987357]
Diabetic Retinopathy (DR), a prevalent and severe complication of diabetes, affects millions of individuals globally. Recent advancements in imaging technologies provide opportunities for the early detection of DR but also pose significant challenges. This study introduces a novel multimodal approach that leverages these imaging modalities to notably enhance DR classification.
arXiv Detail & Related papers (2023-10-03T09:35:38Z)
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z)
Significantly improving zero-shot X-ray pathology classification via fine-tuning pre-trained image-text encoders [50.689585476660554]
We propose a new fine-tuning strategy that includes positive-pair loss relaxation and random sentence sampling. Our approach consistently improves overall zero-shot pathology classification across four chest X-ray datasets and three pre-trained models.
arXiv Detail & Related papers (2022-12-14T06:04:18Z)
Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance. For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming. In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z)
Convolutional-LSTM for Multi-Image to Single Output Medical Prediction [55.41644538483948]
A common scenario in developing countries is to have the volume metadata lost due multiple reasons. It is possible to get a multi-image to single diagnostic model which mimics human doctor diagnostic process.
arXiv Detail & Related papers (2020-10-20T04:30:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.