Quasi-multimodal-based pathophysiological feature learning for retinal disease diagnosis
- URL: http://arxiv.org/abs/2602.03622v1
- Date: Tue, 03 Feb 2026 15:13:57 GMT
- Title: Quasi-multimodal-based pathophysiological feature learning for retinal disease diagnosis
- Authors: Lu Zhang, Huizhen Yu, Zuowei Wang, Fu Gui, Yatu Guo, Wei Zhang, Mengyu Jia,
- Abstract summary: A unified framework that integrates multimodal data synthesis and fusion is proposed for retinal disease classification and grading.<n>The proposed learning system is thoroughly interpreted through visualizations in both image and feature spaces.<n>This work not only enhances the accuracy and efficiency of retinal disease screening but also offers a scalable framework for data augmentation across various medical imaging modalities.
- Score: 4.437523386839875
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retinal diseases spanning a broad spectrum can be effectively identified and diagnosed using complementary signals from multimodal data. However, multimodal diagnosis in ophthalmic practice is typically challenged in terms of data heterogeneity, potential invasiveness, registration complexity, and so on. As such, a unified framework that integrates multimodal data synthesis and fusion is proposed for retinal disease classification and grading. Specifically, the synthesized multimodal data incorporates fundus fluorescein angiography (FFA), multispectral imaging (MSI), and saliency maps that emphasize latent lesions as well as optic disc/cup regions. Parallel models are independently trained to learn modality-specific representations that capture cross-pathophysiological signatures. These features are then adaptively calibrated within and across modalities to perform information pruning and flexible integration according to downstream tasks. The proposed learning system is thoroughly interpreted through visualizations in both image and feature spaces. Extensive experiments on two public datasets demonstrated the superiority of our approach over state-of-the-art ones in the tasks of multi-label classification (F1-score: 0.683, AUC: 0.953) and diabetic retinopathy grading (Accuracy:0.842, Kappa: 0.861). This work not only enhances the accuracy and efficiency of retinal disease screening but also offers a scalable framework for data augmentation across various medical imaging modalities.
Related papers
- Beyond CLIP: Knowledge-Enhanced Multimodal Transformers for Cross-Modal Alignment in Diabetic Retinopathy Diagnosis [7.945705180020063]
We propose a knowledge-enhanced joint embedding framework that integrates retinal fundus images, clinical text, and structured patient data.<n>Our framework achieves near-perfect text-to-image retrieval performance with Recall@1 of 99.94% compared to fine-tuned CLIP's 1.29%.
arXiv Detail & Related papers (2025-12-22T18:41:45Z) - A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis [82.01597026329158]
We introduce a Correlation-Regulated Alignment Framework for Tissue Synthesis (CRAFTS) for pathology-specific text-to-image synthesis.<n>CRAFTS incorporates a novel alignment mechanism that suppresses semantic drift to ensure biological accuracy.<n>This model generates diverse pathological images spanning 30 cancer types, with quality rigorously validated by objective metrics and pathologist evaluations.
arXiv Detail & Related papers (2025-12-15T10:22:43Z) - Multimodal Medical Endoscopic Image Analysis via Progressive Disentangle-aware Contrastive Learning [11.158864816564538]
We present an innovative multi-modality representation learning framework based on the Align-Disentangle-Fusion' mechanism.<n>Our method consistently outperforms state-of-the-art approaches, achieving superior accuracy across diverse real clinical scenarios.
arXiv Detail & Related papers (2025-08-23T03:02:51Z) - Robust Incomplete-Modality Alignment for Ophthalmic Disease Grading and Diagnosis via Labeled Optimal Transport [28.96009174108652]
Multimodal ophthalmic imaging-based diagnosis integrates color fundus image with optical coherence tomography ( OCT) to provide a comprehensive view of ocular pathologies.<n>Existing commonly used pipelines, such as modality imputation and distillation methods, face notable limitations.<n>We propose a novel multimodal alignment and fusion framework capable of robustly handling missing modalities in the task of ophthalmic diagnostics.
arXiv Detail & Related papers (2025-07-07T13:36:39Z) - Robust Multimodal Learning for Ophthalmic Disease Grading via Disentangled Representation [30.697291934309206]
multimodal data is rare in real-world applications due to a lack of medical equipment and concerns about data privacy.<n>Traditional deep learning methods typically address these issues by learning representations in latent space.<n>Authors propose the Essence-Point and Disentangle Representation Learning (EDRL) strategy, which integrates a self-distillation mechanism into an end-to-end framework.
arXiv Detail & Related papers (2025-03-07T10:58:38Z) - CLCLSA: Cross-omics Linked embedding with Contrastive Learning and Self
Attention for multi-omics integration with incomplete multi-omics data [47.2764293508916]
Integration of heterogeneous and high-dimensional multi-omics data is becoming increasingly important in understanding genetic data.
One obstacle faced when performing multi-omics data integration is the existence of unpaired multi-omics data due to instrument sensitivity and cost.
We propose a deep learning method for multi-omics integration with incomplete data by Cross-omics Linked unified embedding with Contrastive Learning and Self Attention.
arXiv Detail & Related papers (2023-04-12T00:22:18Z) - Affinity Feature Strengthening for Accurate, Complete and Robust Vessel
Segmentation [48.638327652506284]
Vessel segmentation is crucial in many medical image applications, such as detecting coronary stenoses, retinal vessel diseases and brain aneurysms.
We present a novel approach, the affinity feature strengthening network (AFN), which jointly models geometry and refines pixel-wise segmentation features using a contrast-insensitive, multiscale affinity approach.
arXiv Detail & Related papers (2022-11-12T05:39:17Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z) - Co-Heterogeneous and Adaptive Segmentation from Multi-Source and
Multi-Phase CT Imaging Data: A Study on Pathological Liver and Lesion
Segmentation [48.504790189796836]
We present a novel segmentation strategy, co-heterogenous and adaptive segmentation (CHASe)
We propose a versatile framework that fuses appearance based semi-supervision, mask based adversarial domain adaptation, and pseudo-labeling.
CHASe can further improve pathological liver mask Dice-Sorensen coefficients by ranges of $4.2% sim 9.4%$.
arXiv Detail & Related papers (2020-05-27T06:58:39Z) - Diagnosis of Coronavirus Disease 2019 (COVID-19) with Structured Latent
Multi-View Representation Learning [48.05232274463484]
Recently, the outbreak of Coronavirus Disease 2019 (COVID-19) has spread rapidly across the world.
Due to the large number of affected patients and heavy labor for doctors, computer-aided diagnosis with machine learning algorithm is urgently needed.
In this study, we propose to conduct the diagnosis of COVID-19 with a series of features extracted from CT images.
arXiv Detail & Related papers (2020-05-06T15:19:15Z) - Synergic Adversarial Label Learning for Grading Retinal Diseases via
Knowledge Distillation and Multi-task Learning [29.46896757506273]
Well-qualified doctors annotated images are very expensive and only a limited amount of data is available for various retinal diseases.
Some studies show that AMD and DR share some common features like hemorrhagic points and exudation but most classification algorithms only train those disease models independently.
We propose a method called synergic adversarial label learning (SALL) which leverages relevant retinal disease labels in both semantic and feature space as additional signals and train the model in a collaborative manner.
arXiv Detail & Related papers (2020-03-24T01:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.