Decentralized LoRA Augmented Transformer with Context-aware Multi-scale Feature Learning for Secured Eye Diagnosis
- URL: http://arxiv.org/abs/2505.06982v2
- Date: Mon, 28 Jul 2025 05:12:01 GMT
- Title: Decentralized LoRA Augmented Transformer with Context-aware Multi-scale Feature Learning for Secured Eye Diagnosis
- Authors: Md. Naimur Asif Borno, Md Sakib Hossain Shovon, MD Hanif Sikder, Iffat Firozy Rimi, Tahani Jaser Alahmadi, Mohammad Ali Moni,
- Abstract summary: This paper proposes a novel Data efficient Image Transformer (DeiT) based framework that integrates context aware multiscale patch embedding, Low-Rank Adaptation (LoRA), knowledge distillation, and federated learning to address these challenges in a unified manner.<n>The proposed model effectively captures both local and global retinal features by leveraging multi scale patch representations with local and global attention mechanisms.
- Score: 2.1358421658740214
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate and privacy-preserving diagnosis of ophthalmic diseases remains a critical challenge in medical imaging, particularly given the limitations of existing deep learning models in handling data imbalance, data privacy concerns, spatial feature diversity, and clinical interpretability. This paper proposes a novel Data efficient Image Transformer (DeiT) based framework that integrates context aware multiscale patch embedding, Low-Rank Adaptation (LoRA), knowledge distillation, and federated learning to address these challenges in a unified manner. The proposed model effectively captures both local and global retinal features by leveraging multi scale patch representations with local and global attention mechanisms. LoRA integration enhances computational efficiency by reducing the number of trainable parameters, while federated learning ensures secure, decentralized training without compromising data privacy. A knowledge distillation strategy further improves generalization in data scarce settings. Comprehensive evaluations on two benchmark datasets OCTDL and the Eye Disease Image Dataset demonstrate that the proposed framework consistently outperforms both traditional CNNs and state of the art transformer architectures across key metrics including AUC, F1 score, and precision. Furthermore, Grad-CAM++ visualizations provide interpretable insights into model predictions, supporting clinical trust. This work establishes a strong foundation for scalable, secure, and explainable AI applications in ophthalmic diagnostics.
Related papers
- PRETI: Patient-Aware Retinal Foundation Model via Metadata-Guided Representation Learning [3.771396977579353]
PRETI is a retinal foundation model that integrates metadata-aware learning with robust self-supervised representation learning.<n>We construct patient-level data pairs, associating images from the same individual to improve robustness against non-clinical variations.<n>Experiments demonstrate PRETI achieves state-of-the-art results across diverse diseases and biomarker predictions.
arXiv Detail & Related papers (2025-05-18T04:59:03Z) - Enhancing DR Classification with Swin Transformer and Shifted Window Attention [9.99302279736049]
Diabetic retinopathy (DR) is a leading cause of blindness worldwide, underscoring the importance of early detection for effective treatment.<n>We propose a robust preprocessing pipeline incorporating image cropping, Contrast-Limited Adaptive Histogram Equalization (CLAHE), and targeted data augmentation to improve model generalization and resilience.<n>We validate our method on the Aptos and IDRiD datasets for multi-class DR classification, achieving accuracy rates of 89.65% and 97.40%, respectively.
arXiv Detail & Related papers (2025-04-20T13:23:20Z) - Revisiting Medical Image Retrieval via Knowledge Consolidation [46.6989555659494]
We propose a novel method to consolidate knowledge of hierarchical features and functions.<n>We introduce Depth-aware Representation Fusion (DaRF) and Structure-aware Contrastive Hashing (SCH)<n>Our method achieves a 5.6-38.9% improvement in mean Average Precision on the anatomical radiology dataset.
arXiv Detail & Related papers (2025-03-12T13:16:42Z) - Multi-Scale Transformer Architecture for Accurate Medical Image Classification [4.578375402082224]
This study introduces an AI-driven skin lesion classification algorithm built on an enhanced Transformer architecture.<n>By integrating a multi-scale feature fusion mechanism and refining the self-attention process, the model effectively extracts both global and local features.<n>Performance evaluation on the ISIC 2017 dataset demonstrates that the improved Transformer surpasses established AI models.
arXiv Detail & Related papers (2025-02-10T08:22:25Z) - Advancing UWF-SLO Vessel Segmentation with Source-Free Active Domain Adaptation and a Novel Multi-Center Dataset [11.494899967255142]
Accurate vessel segmentation in UWF-SLO images is crucial for diagnosing retinal diseases.
manually labeling high-resolution UWF-SLO images is an extremely challenging, time-consuming and expensive task.
This study introduces a pioneering framework that leverages a patch-based active domain adaptation approach.
arXiv Detail & Related papers (2024-06-19T15:49:06Z) - Distributed Federated Learning-Based Deep Learning Model for Privacy MRI Brain Tumor Detection [11.980634373191542]
Distributed training can facilitate the processing of large medical image datasets, and improve the accuracy and efficiency of disease diagnosis.
This paper presents an innovative approach to medical image classification, leveraging Federated Learning (FL) to address the dual challenges of data privacy and efficient disease diagnosis.
arXiv Detail & Related papers (2024-04-15T09:07:19Z) - Empowering Healthcare through Privacy-Preserving MRI Analysis [3.6394715554048234]
We introduce the Ensemble-Based Federated Learning (EBFL) Framework.
EBFL framework deviates from the conventional approach by emphasizing model features over sharing sensitive patient data.
We have achieved remarkable precision in the classification of brain tumors, including glioma, meningioma, pituitary, and non-tumor instances.
arXiv Detail & Related papers (2024-03-14T19:51:18Z) - Less is more: Ensemble Learning for Retinal Disease Recognition Under
Limited Resources [12.119196313470887]
This paper introduces a novel ensemble learning mechanism designed for recognizing retinal diseases under limited resources.
The mechanism leverages insights from multiple pre-trained models, facilitating the transfer and adaptation of their knowledge to Retinal OCT images.
arXiv Detail & Related papers (2024-02-15T06:58:25Z) - MLIP: Enhancing Medical Visual Representation with Divergence Encoder
and Knowledge-guided Contrastive Learning [48.97640824497327]
We propose a novel framework leveraging domain-specific medical knowledge as guiding signals to integrate language information into the visual domain through image-text contrastive learning.
Our model includes global contrastive learning with our designed divergence encoder, local token-knowledge-patch alignment contrastive learning, and knowledge-guided category-level contrastive learning with expert knowledge.
Notably, MLIP surpasses state-of-the-art methods even with limited annotated data, highlighting the potential of multimodal pre-training in advancing medical representation learning.
arXiv Detail & Related papers (2024-02-03T05:48:50Z) - Enhancing and Adapting in the Clinic: Source-free Unsupervised Domain
Adaptation for Medical Image Enhancement [34.11633495477596]
We propose an algorithm for source-free unsupervised domain adaptive medical image enhancement (SAME)
A structure-preserving enhancement network is first constructed to learn a robust source model from synthesized training data.
A pseudo-label picker is developed to boost the knowledge distillation of enhancement tasks.
arXiv Detail & Related papers (2023-12-03T10:01:59Z) - Leveraging Semi-Supervised Graph Learning for Enhanced Diabetic
Retinopathy Detection [0.0]
Diabetic Retinopathy (DR) is a significant cause of blindness globally, highlighting the urgent need for early detection and effective treatment.
Recent advancements in Machine Learning (ML) techniques have shown promise in DR detection, but the availability of labeled data often limits their performance.
This research proposes a novel Semi-Supervised Graph Learning SSGL algorithm tailored for DR detection.
arXiv Detail & Related papers (2023-09-02T04:42:08Z) - Improving Multiple Sclerosis Lesion Segmentation Across Clinical Sites:
A Federated Learning Approach with Noise-Resilient Training [75.40980802817349]
Deep learning models have shown promise for automatically segmenting MS lesions, but the scarcity of accurately annotated data hinders progress in this area.
We introduce a Decoupled Hard Label Correction (DHLC) strategy that considers the imbalanced distribution and fuzzy boundaries of MS lesions.
We also introduce a Centrally Enhanced Label Correction (CELC) strategy, which leverages the aggregated central model as a correction teacher for all sites.
arXiv Detail & Related papers (2023-08-31T00:36:10Z) - Automatic diagnosis of knee osteoarthritis severity using Swin
transformer [55.01037422579516]
Knee osteoarthritis (KOA) is a widespread condition that can cause chronic pain and stiffness in the knee joint.
We propose an automated approach that employs the Swin Transformer to predict the severity of KOA.
arXiv Detail & Related papers (2023-07-10T09:49:30Z) - USIM-DAL: Uncertainty-aware Statistical Image Modeling-based Dense
Active Learning for Super-resolution [47.38982697349244]
Dense regression is a widely used approach in computer vision for tasks such as image super-resolution, enhancement, depth estimation, etc.
We propose incorporating active learning into dense regression models to address this problem.
Active learning allows models to select the most informative samples for labeling, reducing the overall annotation cost while improving performance.
arXiv Detail & Related papers (2023-05-27T16:33:43Z) - Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [116.87918100031153]
We propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG)
CGT injects clinical relation triples into the visual features as prior knowledge to drive the decoding procedure.
Experiments on the large-scale FFA-IR benchmark demonstrate that the proposed CGT is able to outperform previous benchmark methods.
arXiv Detail & Related papers (2022-06-04T13:16:30Z) - Cross-level Contrastive Learning and Consistency Constraint for
Semi-supervised Medical Image Segmentation [46.678279106837294]
We propose a cross-level constrastive learning scheme to enhance representation capacity for local features in semi-supervised medical image segmentation.
With the help of the cross-level contrastive learning and consistency constraint, the unlabelled data can be effectively explored to improve segmentation performance.
arXiv Detail & Related papers (2022-02-08T15:12:11Z) - Cross-Site Severity Assessment of COVID-19 from CT Images via Domain
Adaptation [64.59521853145368]
Early and accurate severity assessment of Coronavirus disease 2019 (COVID-19) based on computed tomography (CT) images offers a great help to the estimation of intensive care unit event.
To augment the labeled data and improve the generalization ability of the classification model, it is necessary to aggregate data from multiple sites.
This task faces several challenges including class imbalance between mild and severe infections, domain distribution discrepancy between sites, and presence of heterogeneous features.
arXiv Detail & Related papers (2021-09-08T07:56:51Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z) - Learning Binary Semantic Embedding for Histology Image Classification
and Retrieval [56.34863511025423]
We propose a novel method for Learning Binary Semantic Embedding (LBSE)
Based on the efficient and effective embedding, classification and retrieval are performed to provide interpretable computer-assisted diagnosis for histology images.
Experiments conducted on three benchmark datasets validate the superiority of LBSE under various scenarios.
arXiv Detail & Related papers (2020-10-07T08:36:44Z) - Multi-label Thoracic Disease Image Classification with Cross-Attention
Networks [65.37531731899837]
We propose a novel scheme of Cross-Attention Networks (CAN) for automated thoracic disease classification from chest x-ray images.
We also design a new loss function that beyond cross-entropy loss to help cross-attention process and is able to overcome the imbalance between classes and easy-dominated samples within each class.
arXiv Detail & Related papers (2020-07-21T14:37:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.