Related papers: DMS-Net:Dual-Modal Multi-Scale Siamese Network for Binocular Fundus Image Classification

DMS-Net:Dual-Modal Multi-Scale Siamese Network for Binocular Fundus Image Classification

URL: http://arxiv.org/abs/2504.18046v3
Date: Tue, 09 Sep 2025 09:49:58 GMT
Title: DMS-Net:Dual-Modal Multi-Scale Siamese Network for Binocular Fundus Image Classification
Authors: Guohao Huo, Zibo Lin, Zitong Wang, Ruiting Dai, Hao Tang,
Abstract summary: We propose DMS-Net, a dual-modal multi-scale siamese network for binocular retinal image classification.<n>The framework employs a weight-sharing siamese ResNet-152 architecture to concurrently extract deep semantic features from bilateral fundus images.<n>It achieves state-of-the-art performance with an accuracy of 82.9%, recall of 84.5%, and a Cohen's kappa coefficient of 83.2%.
Score: 8.86559854172874
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Ophthalmic diseases pose a significant global health burden. However, traditional diagnostic methods and existing monocular image-based deep learning approaches often overlook the pathological correlations between the two eyes. In practical medical robotic diagnostic scenarios, paired retinal images (binocular fundus images) are frequently required as diagnostic evidence. To address this, we propose DMS-Net-a dual-modal multi-scale siamese network for binocular retinal image classification. The framework employs a weight-sharing siamese ResNet-152 architecture to concurrently extract deep semantic features from bilateral fundus images. To tackle challenges like indistinct lesion boundaries and diffuse pathological distributions, we introduce the OmniPool Spatial Integrator Module (OSIM), which achieves multi-resolution feature aggregation through multi-scale adaptive pooling and spatial attention mechanisms. Furthermore, the Calibrated Analogous Semantic Fusion Module (CASFM) leverages spatial-semantic recalibration and bidirectional attention mechanisms to enhance cross-modal interaction, aggregating modality-agnostic representations of fundus structures. To fully exploit the differential semantic information of lesions present in bilateral fundus features, we introduce the Cross-Modal Contrastive Alignment Module (CCAM). Additionally, to enhance the aggregation of lesion-correlated semantic information, we introduce the Cross-Modal Integrative Alignment Module (CIAM). Evaluation on the ODIR-5K dataset demonstrates that DMS-Net achieves state-of-the-art performance with an accuracy of 82.9%, recall of 84.5%, and a Cohen's kappa coefficient of 83.2%, showcasing robust capacity in detecting symmetrical pathologies and improving clinical decision-making for ocular diseases. Code and the processed dataset will be released subsequently.

Related papers

Dual-Encoder Transformer-Based Multimodal Learning for Ischemic Stroke Lesion Segmentation Using Diffusion MRI [5.332404648315838]
We study ischemic stroke lesion segmentation using multimodal diffusion MRI from the ISLES 2022 dataset.<n>Several state-of-the-art convolutional and transformer-based architectures, including U-Net variants, Swin-UNet, and TransUNet, are benchmarked.<n>Results show that transformer-based models outperform convolutional baselines, and the proposed dual-encoder TransUNet achieves the best performance, reaching a Dice score of 85.4% on the test set.
arXiv Detail & Related papers (2025-12-23T15:24:31Z)
A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis [82.01597026329158]
We introduce a Correlation-Regulated Alignment Framework for Tissue Synthesis (CRAFTS) for pathology-specific text-to-image synthesis.<n>CRAFTS incorporates a novel alignment mechanism that suppresses semantic drift to ensure biological accuracy.<n>This model generates diverse pathological images spanning 30 cancer types, with quality rigorously validated by objective metrics and pathologist evaluations.
arXiv Detail & Related papers (2025-12-15T10:22:43Z)
XOCT: Enhancing OCT to OCTA Translation via Cross-Dimensional Supervised Multi-Scale Feature Learning [10.638764687563555]
X OCT is a novel deep learning framework that integrates Cross-Dimensional Supervision (CDS) with a Multi-Scale Feature Fusion (MSFF) network for vascular reconstruction.<n>Our experiments on the OCTA-500 dataset demonstrate X OCT's improvements, especially for the en-face projections.
arXiv Detail & Related papers (2025-09-09T07:25:36Z)
A Semantic Segmentation Algorithm for Pleural Effusion Based on DBIF-AUNet [22.657295396752023]
Pleural effusion semantic segmentation can significantly enhance the accuracy and timeliness of clinical diagnosis and treatment.<n>Existing methods often struggle with diverse image variations and complex edges.<n>We propose the Dual-Branch Interactive Fusion Attention model (DBIF-AUNet) to address these challenges.
arXiv Detail & Related papers (2025-08-08T10:14:51Z)
Multimodal Causal-Driven Representation Learning for Generalizable Medical Image Segmentation [56.52520416420957]
We propose Multimodal Causal-Driven Representation Learning (MCDRL) to tackle domain generalization in medical image segmentation.<n>MCDRL consistently outperforms competing methods, yielding superior segmentation accuracy and exhibiting robust generalizability.
arXiv Detail & Related papers (2025-08-07T03:41:41Z)
Robust Incomplete-Modality Alignment for Ophthalmic Disease Grading and Diagnosis via Labeled Optimal Transport [28.96009174108652]
Multimodal ophthalmic imaging-based diagnosis integrates color fundus image with optical coherence tomography ( OCT) to provide a comprehensive view of ocular pathologies.<n>Existing commonly used pipelines, such as modality imputation and distillation methods, face notable limitations.<n>We propose a novel multimodal alignment and fusion framework capable of robustly handling missing modalities in the task of ophthalmic diagnostics.
arXiv Detail & Related papers (2025-07-07T13:36:39Z)
Towards a Multimodal MRI-Based Foundation Model for Multi-Level Feature Exploration in Segmentation, Molecular Subtyping, and Grading of Glioma [0.2796197251957244]
Multi-Task S-UNETR (MTSUNET) model is a novel foundation-based framework built on the BrainSegFounder model.<n>It simultaneously performs glioma segmentation, histological subtyping and neuroimaging subtyping.<n>It shows significant potential for advancing noninvasive, personalized glioma management by improving predictive accuracy and interpretability.
arXiv Detail & Related papers (2025-03-10T01:27:09Z)
Multimodal Outer Arithmetic Block Dual Fusion of Whole Slide Images and Omics Data for Precision Oncology [6.418265127069878]
We propose the use of omic embeddings during early and late fusion to capture complementary information from local (patch-level) to global (slide-level) interactions.<n>This dual fusion strategy enhances interpretability and classification performance, highlighting its potential for clinical diagnostics.
arXiv Detail & Related papers (2024-11-26T13:25:53Z)
Serp-Mamba: Advancing High-Resolution Retinal Vessel Segmentation with Selective State-Space Model [45.682311387979944]
We propose the first Serpentine Mamba (Serp-Mamba) network to address this challenging task. We first devise a Serpentine Interwoven Adaptive (SIA) scan mechanism, which scans UWF-SLO images along curved vessel structures in a snake-like crawling manner. Second, we propose an Ambiguity-Driven Dual Recalibration module to address the category imbalance problem intensified by high-resolution images.
arXiv Detail & Related papers (2024-09-06T15:40:47Z)
SDR-Former: A Siamese Dual-Resolution Transformer for Liver Lesion Classification Using 3D Multi-Phase Imaging [59.78761085714715]
This study proposes a novel Siamese Dual-Resolution Transformer (SDR-Former) framework for liver lesion classification. The proposed framework has been validated through comprehensive experiments on two clinical datasets. To support the scientific community, we are releasing our extensive multi-phase MR dataset for liver lesion analysis to the public.
arXiv Detail & Related papers (2024-02-27T06:32:56Z)
Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis. We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z)
Affinity Feature Strengthening for Accurate, Complete and Robust Vessel Segmentation [48.638327652506284]
Vessel segmentation is crucial in many medical image applications, such as detecting coronary stenoses, retinal vessel diseases and brain aneurysms. We present a novel approach, the affinity feature strengthening network (AFN), which jointly models geometry and refines pixel-wise segmentation features using a contrast-insensitive, multiscale affinity approach.
arXiv Detail & Related papers (2022-11-12T05:39:17Z)
RetiFluidNet: A Self-Adaptive and Multi-Attention Deep Convolutional Network for Retinal OCT Fluid Segmentation [3.57686754209902]
Quantification of retinal fluids is necessary for OCT-guided treatment management. New convolutional neural architecture named RetiFluidNet is proposed for multi-class retinal fluid segmentation. Model benefits from hierarchical representation learning of textural, contextual, and edge features.
arXiv Detail & Related papers (2022-09-26T07:18:00Z)
Superresolution and Segmentation of OCT scans using Multi-Stage adversarial Guided Attention Training [18.056525121226862]
We propose the multi-stage & multi-discriminatory generative adversarial network (MultiSDGAN) to translate OCT scans in high-resolution segmentation labels. We evaluate and compare various combinations of channel and spatial attention to the MultiSDGAN architecture to extract more powerful feature maps. Our results demonstrate relative improvements of 21.44% and 19.45% on the Dice coefficient and SSIM, respectively.
arXiv Detail & Related papers (2022-06-10T00:26:55Z)
InDuDoNet+: A Model-Driven Interpretable Dual Domain Network for Metal Artifact Reduction in CT Images [53.4351366246531]
We construct a novel interpretable dual domain network, termed InDuDoNet+, into which CT imaging process is finely embedded. We analyze the CT values among different tissues, and merge the prior observations into a prior network for our InDuDoNet+, which significantly improve its generalization performance.
arXiv Detail & Related papers (2021-12-23T15:52:37Z)
Symmetry-Enhanced Attention Network for Acute Ischemic Infarct Segmentation with Non-Contrast CT Images [50.55978219682419]
We propose a symmetry enhanced attention network (SEAN) for acute ischemic infarct segmentation. Our proposed network automatically transforms an input CT image into the standard space where the brain tissue is bilaterally symmetric. The proposed SEAN outperforms some symmetry-based state-of-the-art methods in terms of both dice coefficient and infarct localization.
arXiv Detail & Related papers (2021-10-11T07:13:26Z)
Multi-Modal Multi-Instance Learning for Retinal Disease Recognition [10.294738095942812]
We aim to build a deep neural network that recognizes multiple vision-threatening diseases for the given case. As both data acquisition and manual labeling are extremely expensive in the medical domain, the network has to be relatively lightweight.
arXiv Detail & Related papers (2021-09-25T08:16:47Z)
Cross-Modality Brain Tumor Segmentation via Bidirectional Global-to-Local Unsupervised Domain Adaptation [61.01704175938995]
In this paper, we propose a novel Bidirectional Global-to-Local (BiGL) adaptation framework under a UDA scheme. Specifically, a bidirectional image synthesis and segmentation module is proposed to segment the brain tumor. The proposed method outperforms several state-of-the-art unsupervised domain adaptation methods by a large margin.
arXiv Detail & Related papers (2021-05-17T10:11:45Z)
Few-shot Medical Image Segmentation using a Global Correlation Network with Discriminative Embedding [60.89561661441736]
We propose a novel method for few-shot medical image segmentation. We construct our few-shot image segmentor using a deep convolutional network trained episodically. We enhance discriminability of deep embedding to encourage clustering of the feature domains of the same class.
arXiv Detail & Related papers (2020-12-10T04:01:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.