Learning to Fuse and Reconstruct Multi-View Graphs for Diabetic Retinopathy Grading
- URL: http://arxiv.org/abs/2602.21944v1
- Date: Wed, 25 Feb 2026 14:28:57 GMT
- Title: Learning to Fuse and Reconstruct Multi-View Graphs for Diabetic Retinopathy Grading
- Authors: Haoran Li, Yuxin Lin, Huan Wang, Xiaoling Luo, Qi Zhu, Jiahua Shi, Huaming Chen, Bo Du, Johan Barthelemy, Zongyan Xue, Jun Shen, Yong Xu,
- Abstract summary: Diabetic retinopathy (DR) is one of the leading causes of vision loss worldwide.<n>Recent clinical practices leverage multi-view fundus images for DR detection with a wide coverage of the field of view.<n>We present MVGFDR, an end-to-end Multi-View Graph Fusion framework for DR grading.
- Score: 45.02913606252357
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diabetic retinopathy (DR) is one of the leading causes of vision loss worldwide, making early and accurate DR grading critical for timely intervention. Recent clinical practices leverage multi-view fundus images for DR detection with a wide coverage of the field of view (FOV), motivating deep learning methods to explore the potential of multi-view learning for DR grading. However, existing methods often overlook the inter-view correlations when fusing multi-view fundus images, failing to fully exploit the inherent consistency across views originating from the same patient. In this work, we present MVGFDR, an end-to-end Multi-View Graph Fusion framework for DR grading. Different from existing methods that directly fuse visual features from multiple views, MVGFDR is equipped with a novel Multi-View Graph Fusion (MVGF) module to explicitly disentangle the shared and view-specific visual features. Specifically, MVGF comprises three key components: (1) Multi-view Graph Initialization, which constructs visual graphs via residual-guided connections and employs Discrete Cosine Transform (DCT) coefficients as frequency-domain anchors; (2) Multi-view Graph Fusion, which integrates selective nodes across multi-view graphs based on frequency-domain relevance to capture complementary view-specific information; and (3) Masked Cross-view Reconstruction, which leverages masked reconstruction of shared information across views to facilitate view-invariant representation learning. Extensive experimental results on MFIDDR, by far the largest multi-view fundus image dataset, demonstrate the superiority of our proposed approach over existing state-of-the-art approaches in diabetic retinopathy grading.
Related papers
- Multi-modal and Multi-view Fundus Image Fusion for Retinopathy Diagnosis via Multi-scale Cross-attention and Shifted Window Self-attention [4.076237636695921]
The joint interpretation of multi-modal and multi-view fundus images is critical for retinopathy prevention.<n>We propose a multi-modal fundus image fusion method based on multi-scale cross-attention.<n>We also design a retinopathy diagnosis framework to help ophthalmologists reduce workload and improve diagnostic accuracy.
arXiv Detail & Related papers (2025-04-12T07:06:15Z) - ViKL: A Mammography Interpretation Framework via Multimodal Aggregation of Visual-knowledge-linguistic Features [54.37042005469384]
We announce MVKL, the first multimodal mammography dataset encompassing multi-view images, detailed manifestations and reports.
Based on this dataset, we focus on the challanging task of unsupervised pretraining.
We propose ViKL, a framework that synergizes Visual, Knowledge, and Linguistic features.
arXiv Detail & Related papers (2024-09-24T05:01:23Z) - MV-Swin-T: Mammogram Classification with Multi-view Swin Transformer [0.257133335028485]
We propose an innovative multi-view network based on transformers to address challenges in mammographic image classification.
Our approach introduces a novel shifted window-based dynamic attention block, facilitating the effective integration of multi-view information.
arXiv Detail & Related papers (2024-02-26T04:41:04Z) - Parkinson's Disease Classification Using Contrastive Graph Cross-View Learning with Multimodal Fusion of SPECT Images and Clinical Features [5.660131312162423]
Parkinson's Disease (PD) affects millions globally, impacting movement.
Prior research utilized deep learning for PD prediction, primarily focusing on medical images, neglecting the data's underlying manifold structure.
This work proposes a multimodal approach encompassing both image and non-image features, leveraging contrastive cross-view graph fusion for PD classification.
arXiv Detail & Related papers (2023-11-25T02:32:46Z) - C^2M-DoT: Cross-modal consistent multi-view medical report generation
with domain transfer network [67.97926983664676]
We propose a cross-modal consistent multi-view medical report generation with a domain transfer network (C2M-DoT)
C2M-DoT substantially outperforms state-of-the-art baselines in all metrics.
arXiv Detail & Related papers (2023-10-09T02:31:36Z) - Multi-Spectral Image Stitching via Spatial Graph Reasoning [52.27796682972484]
We propose a spatial graph reasoning based multi-spectral image stitching method.
We embed multi-scale complementary features from the same view position into a set of nodes.
By introducing long-range coherence along spatial and channel dimensions, the complementarity of pixel relations and channel interdependencies aids in the reconstruction of aligned multi-view features.
arXiv Detail & Related papers (2023-07-31T15:04:52Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - MvCo-DoT:Multi-View Contrastive Domain Transfer Network for Medical
Report Generation [42.804058630251305]
We propose the first multi-view medical report generation model, called MvCo-DoT.
MvCo-DoT first propose a multi-view contrastive learning (MvCo) strategy to help the deep reinforcement learning based model utilize the consistency of multi-view inputs.
Extensive experiments on the IU X-Ray public dataset show that MvCo-DoT outperforms the SOTA medical report generation baselines in all metrics.
arXiv Detail & Related papers (2023-04-15T03:42:26Z) - Multi-Scale Relational Graph Convolutional Network for Multiple Instance
Learning in Histopathology Images [2.6663738081163726]
We introduce the Multi-Scale Graph Convolutional Network (MS-RGCN) as a multiple learning method.
We model histopathology image patches and their relation with neighboring patches and patches at other scales as a graph.
We experiment on prostate cancer histopathology images to predict magnification groups based on the extracted features from patches.
arXiv Detail & Related papers (2022-12-17T02:26:42Z) - TransFusion: Multi-view Divergent Fusion for Medical Image Segmentation
with Transformers [8.139069987207494]
We present TransFusion, a Transformer-based architecture to merge divergent multi-view imaging information using convolutional layers and powerful attention mechanisms.
In particular, the Divergent Fusion Attention (DiFA) module is proposed for rich cross-view context modeling and semantic dependency mining.
arXiv Detail & Related papers (2022-03-21T04:02:54Z) - Act Like a Radiologist: Towards Reliable Multi-view Correspondence
Reasoning for Mammogram Mass Detection [49.14070210387509]
We propose an Anatomy-aware Graph convolutional Network (AGN) for mammogram mass detection.
AGN is tailored for mammogram mass detection and endows existing detection methods with multi-view reasoning ability.
Experiments on two standard benchmarks reveal that AGN significantly exceeds the state-of-the-art performance.
arXiv Detail & Related papers (2021-05-21T06:48:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.