Related papers: WDT-MD: Wavelet Diffusion Transformers for Microaneurysm Detection in Fundus Images

WDT-MD: Wavelet Diffusion Transformers for Microaneurysm Detection in Fundus Images

URL: http://arxiv.org/abs/2511.08987v2
Date: Sun, 16 Nov 2025 04:07:00 GMT
Title: WDT-MD: Wavelet Diffusion Transformers for Microaneurysm Detection in Fundus Images
Authors: Yifei Sun, Yuzhi He, Junhao Jia, Jinhong Wang, Ruiquan Ge, Changmiao Wang, Hongxia Xu,
Abstract summary: Microaneurysms (MAs) present as sub-60 $m$ lesions in fundus images with highly variable photometric and morphological characteristics.<n> diffusion-based anomaly detection has emerged as a promising approach for automated MA screening.<n>We propose a Wavelet Diffusion Transformer framework for MA Detection (WDT-MD)<n>WDT-MD features a noise-encoded image conditioning mechanism to avoid "identity mapping"
Score: 9.271194324930098
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Microaneurysms (MAs), the earliest pathognomonic signs of Diabetic Retinopathy (DR), present as sub-60 $μm$ lesions in fundus images with highly variable photometric and morphological characteristics, rendering manual screening not only labor-intensive but inherently error-prone. While diffusion-based anomaly detection has emerged as a promising approach for automated MA screening, its clinical application is hindered by three fundamental limitations. First, these models often fall prey to "identity mapping", where they inadvertently replicate the input image. Second, they struggle to distinguish MAs from other anomalies, leading to high false positives. Third, their suboptimal reconstruction of normal features hampers overall performance. To address these challenges, we propose a Wavelet Diffusion Transformer framework for MA Detection (WDT-MD), which features three key innovations: a noise-encoded image conditioning mechanism to avoid "identity mapping" by perturbing image conditions during training; pseudo-normal pattern synthesis via inpainting to introduce pixel-level supervision, enabling discrimination between MAs and other anomalies; and a wavelet diffusion Transformer architecture that combines the global modeling capability of diffusion Transformers with multi-scale wavelet analysis to enhance reconstruction of normal retinal features. Comprehensive experiments on the IDRiD and e-ophtha MA datasets demonstrate that WDT-MD outperforms state-of-the-art methods in both pixel-level and image-level MA detection. This advancement holds significant promise for improving early DR screening.

Related papers

Accelerating 3D Photoacoustic Computed Tomography with End-to-End Physics-Aware Neural Operators [74.65171736966131]
Photoacoustic computed tomography (PACT) combines optical contrast with ultrasonic resolution, achieving deep-tissue imaging beyond the optical diffusion limit.<n>Current implementations require dense transducer arrays and prolonged acquisition times, limiting clinical translation.<n>We introduce Pano, an end-to-end physics-aware model that directly learns the inverse acoustic mapping from sensor measurements to volumetric reconstructions.
arXiv Detail & Related papers (2025-09-11T23:12:55Z)
3D Wavelet Latent Diffusion Model for Whole-Body MR-to-CT Modality Translation [13.252652406393205]
Existing MR-to-CT methods for whole-body imaging often suffer from poor spatial alignment between the generated CT and input MR images.<n>We present a novel 3D Wavelet Latent Diffusion Model (3D-WLDM) that addresses these limitations.<n>By incorporating a Wavelet Residual Module into the encoder-decoder architecture, we enhance the capture and reconstruction of fine-scale features across image and latent spaces.
arXiv Detail & Related papers (2025-07-14T06:17:05Z)
Harnessing EHRs for Diffusion-based Anomaly Detection on Chest X-rays [10.062242117926177]
Unsupervised anomaly detection (UAD) in medical imaging is crucial for identifying pathological abnormalities without requiring extensive labeled data.<n>We propose Diff3M, a multi-modal diffusion-based framework that integrates chest X-rays and structured Electronic Health Records for enhanced anomaly detection.
arXiv Detail & Related papers (2025-05-22T22:02:47Z)
Deformation-aware GAN for Medical Image Synthesis with Substantially Misaligned Pairs [0.0]
We propose a novel Deformation-aware GAN (DA-GAN) to dynamically correct the misalignment during the image synthesis based on inverse consistency. Experimental results show that DA-GAN achieved superior performance on a public dataset with simulated misalignments and a real-world lung MRI-CT dataset with respiratory motion misalignment.
arXiv Detail & Related papers (2024-08-18T10:29:35Z)
StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model [62.25424831998405]
StealthDiffusion is a framework that modifies AI-generated images into high-quality, imperceptible adversarial examples. It is effective in both white-box and black-box settings, transforming AI-generated images into high-quality adversarial forgeries.
arXiv Detail & Related papers (2024-08-11T01:22:29Z)
QSMDiff: Unsupervised 3D Diffusion Models for Quantitative Susceptibility Mapping [12.629091097618792]
Quantitative Susceptibility Mapping (QSM) is an inverse problem for magnetic susceptibility distributions from MRI tissue phases. Recent developments in diffusion models have demonstrated potential for solving 2D medical imaging inverse problems. We developed a 3D image patch-based diffusion model, namely QSMDiff, for robust QSM reconstruction across different scan parameters.
arXiv Detail & Related papers (2024-03-21T01:37:50Z)
Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection. Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels. Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z)
SDR-Former: A Siamese Dual-Resolution Transformer for Liver Lesion Classification Using 3D Multi-Phase Imaging [59.78761085714715]
This study proposes a novel Siamese Dual-Resolution Transformer (SDR-Former) framework for liver lesion classification. The proposed framework has been validated through comprehensive experiments on two clinical datasets. To support the scientific community, we are releasing our extensive multi-phase MR dataset for liver lesion analysis to the public.
arXiv Detail & Related papers (2024-02-27T06:32:56Z)
DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection [55.48770333927732]
We propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection. It consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion's denoising network, and a feature-space pre-trained feature extractor. Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-12-11T18:38:28Z)
Latent Diffusion Model for Medical Image Standardization and Enhancement [11.295078152769559]
DiffusionCT is a score-based DDPM model that transforms disparate non-standard distributions into a standardized form. The architecture comprises a U-Net-based encoder-decoder, augmented by a DDPM model integrated at the bottleneck position. Empirical tests on patient CT images indicate notable improvements in image standardization using DiffusionCT.
arXiv Detail & Related papers (2023-10-08T17:11:14Z)
Diffusion Models for Counterfactual Generation and Anomaly Detection in Brain Images [39.94162291765236]
We present a weakly supervised method to generate a healthy version of a diseased image and then use it to obtain a pixel-wise anomaly map. We employ a diffusion model trained on healthy samples and combine Denoising Diffusion Probabilistic Model (DDPM) and Denoising Implicit Model (DDIM) at each step of the sampling process.
arXiv Detail & Related papers (2023-08-03T21:56:50Z)
On Sensitivity and Robustness of Normalization Schemes to Input Distribution Shifts in Automatic MR Image Diagnosis [58.634791552376235]
Deep Learning (DL) models have achieved state-of-the-art performance in diagnosing multiple diseases using reconstructed images as input. DL models are sensitive to varying artifacts as it leads to changes in the input data distribution between the training and testing phases. We propose to use other normalization techniques, such as Group Normalization and Layer Normalization, to inject robustness into model performance against varying image artifacts.
arXiv Detail & Related papers (2023-06-23T03:09:03Z)
Negligible effect of brain MRI data preprocessing for tumor segmentation [36.89606202543839]
We conduct experiments on three publicly available datasets and evaluate the effect of different preprocessing steps in deep neural networks. Our results demonstrate that most popular standardization steps add no value to the network performance. We suggest that image intensity normalization approaches do not contribute to model accuracy because of the reduction of signal variance with image standardization.
arXiv Detail & Related papers (2022-04-11T17:29:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.