Related papers: Multi-Scale Target-Aware Representation Learning for Fundus Image Enhancement

Multi-Scale Target-Aware Representation Learning for Fundus Image Enhancement

URL: http://arxiv.org/abs/2505.01831v1
Date: Sat, 03 May 2025 14:25:48 GMT
Title: Multi-Scale Target-Aware Representation Learning for Fundus Image Enhancement
Authors: Haofan Wu, Yin Huang, Yuqing Wu, Qiuyu Yang, Bingfang Wang, Li Zhang, Muhammad Fahadullah Khan, Ali Zia, M. Saleh Memon, Syed Sohail Bukhari, Abdul Fattah Memon, Daizong Ji, Ya Zhang, Ghulam Mustafa, Yin Fang,
Abstract summary: High-quality fundus images provide essential anatomical information for clinical screening and ophthalmic disease diagnosis.<n>Recent years have witnessed promising progress in fundus image enhancement.<n>We propose a multi-scale target-aware representation learning framework (MTRL-FIE) for efficient fundus image enhancement.
Score: 11.652205644265893
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: High-quality fundus images provide essential anatomical information for clinical screening and ophthalmic disease diagnosis. Yet, due to hardware limitations, operational variability, and patient compliance, fundus images often suffer from low resolution and signal-to-noise ratio. Recent years have witnessed promising progress in fundus image enhancement. However, existing works usually focus on restoring structural details or global characteristics of fundus images, lacking a unified image enhancement framework to recover comprehensive multi-scale information. Moreover, few methods pinpoint the target of image enhancement, e.g., lesions, which is crucial for medical image-based diagnosis. To address these challenges, we propose a multi-scale target-aware representation learning framework (MTRL-FIE) for efficient fundus image enhancement. Specifically, we propose a multi-scale feature encoder (MFE) that employs wavelet decomposition to embed both low-frequency structural information and high-frequency details. Next, we design a structure-preserving hierarchical decoder (SHD) to fuse multi-scale feature embeddings for real fundus image restoration. SHD integrates hierarchical fusion and group attention mechanisms to achieve adaptive feature fusion while retaining local structural smoothness. Meanwhile, a target-aware feature aggregation (TFA) module is used to enhance pathological regions and reduce artifacts. Experimental results on multiple fundus image datasets demonstrate the effectiveness and generalizability of MTRL-FIE for fundus image enhancement. Compared to state-of-the-art methods, MTRL-FIE achieves superior enhancement performance with a more lightweight architecture. Furthermore, our approach generalizes to other ophthalmic image processing tasks without supervised fine-tuning, highlighting its potential for clinical applications.

Related papers

FundusGAN: A Hierarchical Feature-Aware Generative Framework for High-Fidelity Fundus Image Generation [35.46876389599076]
FundusGAN is a novel hierarchical feature-aware generative framework specifically designed for high-fidelity fundus image synthesis.<n>We show that FundusGAN consistently outperforms state-of-the-art methods across multiple metrics.
arXiv Detail & Related papers (2025-03-22T18:08:07Z)
RL4Med-DDPO: Reinforcement Learning for Controlled Guidance Towards Diverse Medical Image Generation using Vision-Language Foundation Models [0.7165255458140439]
Vision-Language Foundation Models (VLFM) have shown a tremendous increase in performance in terms of generating high-resolution, photorealistic natural images.<n>We propose a multi-stage architecture where a pre-trained VLFM provides a cursory semantic understanding, while a reinforcement learning algorithm refines the alignment through an iterative process.<n>We demonstrate the effectiveness of our method on a medical imaging skin dataset where the generated images exhibit improved generation quality and alignment with prompt over the fine-tuned Stable Diffusion.
arXiv Detail & Related papers (2025-03-20T01:51:05Z)
A Unified Model for Compressed Sensing MRI Across Undersampling Patterns [69.19631302047569]
We propose a unified MRI reconstruction model robust to various measurement undersampling patterns and image resolutions.<n>Our model improves SSIM by 11% and PSNR by 4 dB over a state-of-the-art CNN (End-to-End VarNet) with 600$times$ faster inference than diffusion methods.
arXiv Detail & Related papers (2024-10-05T20:03:57Z)
Disruptive Autoencoders: Leveraging Low-level features for 3D Medical Image Pre-training [51.16994853817024]
This work focuses on designing an effective pre-training framework for 3D radiology images. We introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations. The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-31T17:59:42Z)
On Sensitivity and Robustness of Normalization Schemes to Input Distribution Shifts in Automatic MR Image Diagnosis [58.634791552376235]
Deep Learning (DL) models have achieved state-of-the-art performance in diagnosing multiple diseases using reconstructed images as input. DL models are sensitive to varying artifacts as it leads to changes in the input data distribution between the training and testing phases. We propose to use other normalization techniques, such as Group Normalization and Layer Normalization, to inject robustness into model performance against varying image artifacts.
arXiv Detail & Related papers (2023-06-23T03:09:03Z)
Bridging Synthetic and Real Images: a Transferable and Multiple Consistency aided Fundus Image Enhancement Framework [61.74188977009786]
We propose an end-to-end optimized teacher-student framework to simultaneously conduct image enhancement and domain adaptation. We also propose a novel multi-stage multi-attention guided enhancement network (MAGE-Net) as the backbones of our teacher and student network.
arXiv Detail & Related papers (2023-02-23T06:16:15Z)
Histopathology DatasetGAN: Synthesizing Large-Resolution Histopathology Datasets [0.0]
Histopathology datasetGAN (HDGAN) is a framework for image generation and segmentation that scales well to large-resolution histopathology images. We make several adaptations from the original framework, including updating the generative backbone, selectively extracting latent features from the generator, and switching to memory-mapped arrays. We evaluate HDGAN on a thrombotic microangiopathy high-resolution tile dataset, demonstrating strong performance on the high-resolution image-annotation generation task.
arXiv Detail & Related papers (2022-07-06T14:33:50Z)
Multimodal-Boost: Multimodal Medical Image Super-Resolution using Multi-Attention Network with Wavelet Transform [5.416279158834623]
Loss of corresponding image resolution degrades the overall performance of medical image diagnosis. Deep learning based single image super resolution (SISR) algorithms has revolutionized the overall diagnosis framework. This work proposes generative adversarial network (GAN) with deep multi-attention modules to learn high-frequency information from low-frequency data.
arXiv Detail & Related papers (2021-10-22T10:13:46Z)
Multi-modal Aggregation Network for Fast MR Imaging [85.25000133194762]
We propose a novel Multi-modal Aggregation Network, named MANet, which is capable of discovering complementary representations from a fully sampled auxiliary modality. In our MANet, the representations from the fully sampled auxiliary and undersampled target modalities are learned independently through a specific network. Our MANet follows a hybrid domain learning framework, which allows it to simultaneously recover the frequency signal in the $k$-space domain.
arXiv Detail & Related papers (2021-10-15T13:16:59Z)
MRI to PET Cross-Modality Translation using Globally and Locally Aware GAN (GLA-GAN) for Multi-Modal Diagnosis of Alzheimer's Disease [0.6597195879147557]
generative adversarial networks (GANs) with the ability to synthesize realist images have shown great potential as an alternative to standard data augmentation techniques.<n>We propose a novel end-to-end, globally and locally aware image-to-image translation GAN (GLA-GAN) with a multi-path architecture that enforces both global structural integrity and fidelity to local details.
arXiv Detail & Related papers (2021-08-04T16:38:33Z)
Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning [62.17532253489087]
Deep learning methods have been shown to produce superior performance on MR image reconstruction. These methods require large amounts of data which is difficult to collect and share due to the high cost of acquisition and medical data privacy regulations. We propose a federated learning (FL) based solution in which we take advantage of the MR data available at different institutions while preserving patients' privacy.
arXiv Detail & Related papers (2021-03-03T03:04:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.