Related papers: Learning to Fuse and Reconstruct Multi-View Graphs for Diabetic Retinopathy Grading

Learning to Fuse and Reconstruct Multi-View Graphs for Diabetic Retinopathy Grading

URL: http://arxiv.org/abs/2602.21944v1
Date: Wed, 25 Feb 2026 14:28:57 GMT
Title: Learning to Fuse and Reconstruct Multi-View Graphs for Diabetic Retinopathy Grading
Authors: Haoran Li, Yuxin Lin, Huan Wang, Xiaoling Luo, Qi Zhu, Jiahua Shi, Huaming Chen, Bo Du, Johan Barthelemy, Zongyan Xue, Jun Shen, Yong Xu,
Abstract summary: Diabetic retinopathy (DR) is one of the leading causes of vision loss worldwide.<n>Recent clinical practices leverage multi-view fundus images for DR detection with a wide coverage of the field of view.<n>We present MVGFDR, an end-to-end Multi-View Graph Fusion framework for DR grading.
Score: 45.02913606252357
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diabetic retinopathy (DR) is one of the leading causes of vision loss worldwide, making early and accurate DR grading critical for timely intervention. Recent clinical practices leverage multi-view fundus images for DR detection with a wide coverage of the field of view (FOV), motivating deep learning methods to explore the potential of multi-view learning for DR grading. However, existing methods often overlook the inter-view correlations when fusing multi-view fundus images, failing to fully exploit the inherent consistency across views originating from the same patient. In this work, we present MVGFDR, an end-to-end Multi-View Graph Fusion framework for DR grading. Different from existing methods that directly fuse visual features from multiple views, MVGFDR is equipped with a novel Multi-View Graph Fusion (MVGF) module to explicitly disentangle the shared and view-specific visual features. Specifically, MVGF comprises three key components: (1) Multi-view Graph Initialization, which constructs visual graphs via residual-guided connections and employs Discrete Cosine Transform (DCT) coefficients as frequency-domain anchors; (2) Multi-view Graph Fusion, which integrates selective nodes across multi-view graphs based on frequency-domain relevance to capture complementary view-specific information; and (3) Masked Cross-view Reconstruction, which leverages masked reconstruction of shared information across views to facilitate view-invariant representation learning. Extensive experimental results on MFIDDR, by far the largest multi-view fundus image dataset, demonstrate the superiority of our proposed approach over existing state-of-the-art approaches in diabetic retinopathy grading.

Related papers

Multi-modal and Multi-view Fundus Image Fusion for Retinopathy Diagnosis via Multi-scale Cross-attention and Shifted Window Self-attention [4.076237636695921]
The joint interpretation of multi-modal and multi-view fundus images is critical for retinopathy prevention.<n>We propose a multi-modal fundus image fusion method based on multi-scale cross-attention.<n>We also design a retinopathy diagnosis framework to help ophthalmologists reduce workload and improve diagnostic accuracy.
arXiv Detail & Related papers (2025-04-12T07:06:15Z)
ViKL: A Mammography Interpretation Framework via Multimodal Aggregation of Visual-knowledge-linguistic Features [54.37042005469384]
We announce MVKL, the first multimodal mammography dataset encompassing multi-view images, detailed manifestations and reports. Based on this dataset, we focus on the challanging task of unsupervised pretraining. We propose ViKL, a framework that synergizes Visual, Knowledge, and Linguistic features.
arXiv Detail & Related papers (2024-09-24T05:01:23Z)
MV-Swin-T: Mammogram Classification with Multi-view Swin Transformer [0.257133335028485]
We propose an innovative multi-view network based on transformers to address challenges in mammographic image classification. Our approach introduces a novel shifted window-based dynamic attention block, facilitating the effective integration of multi-view information.
arXiv Detail & Related papers (2024-02-26T04:41:04Z)
Parkinson's Disease Classification Using Contrastive Graph Cross-View Learning with Multimodal Fusion of SPECT Images and Clinical Features [5.660131312162423]
Parkinson's Disease (PD) affects millions globally, impacting movement. Prior research utilized deep learning for PD prediction, primarily focusing on medical images, neglecting the data's underlying manifold structure. This work proposes a multimodal approach encompassing both image and non-image features, leveraging contrastive cross-view graph fusion for PD classification.
arXiv Detail & Related papers (2023-11-25T02:32:46Z)
C^2M-DoT: Cross-modal consistent multi-view medical report generation with domain transfer network [67.97926983664676]
We propose a cross-modal consistent multi-view medical report generation with a domain transfer network (C2M-DoT) C2M-DoT substantially outperforms state-of-the-art baselines in all metrics.
arXiv Detail & Related papers (2023-10-09T02:31:36Z)
Multi-Spectral Image Stitching via Spatial Graph Reasoning [52.27796682972484]
We propose a spatial graph reasoning based multi-spectral image stitching method. We embed multi-scale complementary features from the same view position into a set of nodes. By introducing long-range coherence along spatial and channel dimensions, the complementarity of pixel relations and channel interdependencies aids in the reconstruction of aligned multi-view features.
arXiv Detail & Related papers (2023-07-31T15:04:52Z)
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z)
MvCo-DoT:Multi-View Contrastive Domain Transfer Network for Medical Report Generation [42.804058630251305]
We propose the first multi-view medical report generation model, called MvCo-DoT. MvCo-DoT first propose a multi-view contrastive learning (MvCo) strategy to help the deep reinforcement learning based model utilize the consistency of multi-view inputs. Extensive experiments on the IU X-Ray public dataset show that MvCo-DoT outperforms the SOTA medical report generation baselines in all metrics.
arXiv Detail & Related papers (2023-04-15T03:42:26Z)
Multi-Scale Relational Graph Convolutional Network for Multiple Instance Learning in Histopathology Images [2.6663738081163726]
We introduce the Multi-Scale Graph Convolutional Network (MS-RGCN) as a multiple learning method. We model histopathology image patches and their relation with neighboring patches and patches at other scales as a graph. We experiment on prostate cancer histopathology images to predict magnification groups based on the extracted features from patches.
arXiv Detail & Related papers (2022-12-17T02:26:42Z)
TransFusion: Multi-view Divergent Fusion for Medical Image Segmentation with Transformers [8.139069987207494]
We present TransFusion, a Transformer-based architecture to merge divergent multi-view imaging information using convolutional layers and powerful attention mechanisms. In particular, the Divergent Fusion Attention (DiFA) module is proposed for rich cross-view context modeling and semantic dependency mining.
arXiv Detail & Related papers (2022-03-21T04:02:54Z)
Act Like a Radiologist: Towards Reliable Multi-view Correspondence Reasoning for Mammogram Mass Detection [49.14070210387509]
We propose an Anatomy-aware Graph convolutional Network (AGN) for mammogram mass detection. AGN is tailored for mammogram mass detection and endows existing detection methods with multi-view reasoning ability. Experiments on two standard benchmarks reveal that AGN significantly exceeds the state-of-the-art performance.
arXiv Detail & Related papers (2021-05-21T06:48:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.