Related papers: D3R-Net: Dual-Domain Denoising Reconstruction Network for Robust Industrial Anomaly Detection

D3R-Net: Dual-Domain Denoising Reconstruction Network for Robust Industrial Anomaly Detection

URL: http://arxiv.org/abs/2602.00126v1
Date: Tue, 27 Jan 2026 23:21:59 GMT
Title: D3R-Net: Dual-Domain Denoising Reconstruction Network for Robust Industrial Anomaly Detection
Authors: Dmytro Filatov, Valentyn Fedorov, Vira Filatova, Andrii Zelenchuk,
Abstract summary: Unsupervised anomaly detection (UAD) is a key ingredient of automated visual inspection in modern manufacturing.<n>We introduce D3R-Net, a Dual-Domain Denoising Reconstruction framework that couples a self-supervised 'healing' task with frequency-aware regularization.<n>In addition to the spatial mean squared error, we employ a Fast Fourier Transform (FFT) magnitude loss that encourages consistency in the frequency domain.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Unsupervised anomaly detection (UAD) is a key ingredient of automated visual inspection in modern manufacturing. The reconstruction-based methods appeal because they have basic architectural design and they process data quickly but they produce oversmoothed results for high-frequency details. As a result, subtle defects are partially reconstructed rather than highlighted, which limits segmentation accuracy. We build on this line of work and introduce D3R-Net, a Dual-Domain Denoising Reconstruction framework that couples a self-supervised 'healing' task with frequency-aware regularization. During training, the network receives synthetically corrupted normal images and is asked to reconstruct the clean targets, which prevents trivial identity mapping and pushes the model to learn the manifold of defect-free textures. In addition to the spatial mean squared error, we employ a Fast Fourier Transform (FFT) magnitude loss that encourages consistency in the frequency domain. The implementation also allows an optional structural similarity (SSIM) term, which we study in an ablation. On the MVTec AD Hazelnut benchmark, D3R-Net with the FFT loss improves localization consistency over a spatial-only baseline: PRO AUC increases from 0.603 to 0.687, while image-level ROC AUC remains robust. Evaluated across fifteen MVTec categories, the FFT variant raises the average pixel ROC AUC from 0.733 to 0.751 and PRO AUC from 0.417 to 0.468 compared to the MSE-only baseline, at roughly 20 FPS on a single GPU. The network is trained from scratch and uses a lightweight convolutional autoencoder backbone, providing a practical alternative to heavy pre-trained feature embedding methods.

Related papers

DINO-SAE: DINO Spherical Autoencoder for High-Fidelity Image Reconstruction and Generation [47.409626500688866]
We present the DINO Spherical Autoencoder (DINO-SAE), a framework that bridges semantic representation and pixel-level reconstruction.<n>Our approach achieves state-of-the-art reconstruction quality, reaching 0.37 rFID and 26.2 dB PSNR, while maintaining strong semantic alignment to the pretrained VFM.
arXiv Detail & Related papers (2026-01-30T12:25:34Z)
Fourier-RWKV: A Multi-State Perception Network for Efficient Image Dehazing [26.57698394898644]
We propose a novel dehazing framework based on a Multi-State Perception paradigm.<n>Fourier-RWKV delivers state-of-the-art performance across diverse haze scenarios.
arXiv Detail & Related papers (2025-12-09T01:35:56Z)
DFIR-DETR: Frequency Domain Enhancement and Dynamic Feature Aggregation for Cross-Scene Small Object Detection [16.16000521213211]
Small object detection in UAV remote sensing images is difficult.<n>Current transformer-based detectors struggle with three critical issues.<n>We introduce DFIR-DETR to tackle these problems through dynamic feature aggregation combined with frequency-domain processing.
arXiv Detail & Related papers (2025-12-08T01:25:10Z)
MeanFlow Transformers with Representation Autoencoders [71.45823902973349]
MeanFlow (MF) is a diffusion-motivated generative model that enables efficient few-step generation by learning long jumps directly from noise to data.<n>We develop an efficient training and sampling scheme for MF in the latent space of a Representation Autoencoder (RAE)<n>We achieve a 1-step FID of 2.03, outperforming vanilla MF's 3.43, while reducing sampling GFLOPS by 38% and total training cost by 83% on ImageNet 256.
arXiv Detail & Related papers (2025-11-17T06:17:08Z)
T3: Test-Time Model Merging in VLMs for Zero-Shot Medical Imaging Analysis [15.624549727053475]
Existing model-merging techniques fail to deliver consistent gains across diverse medical modalities.<n>We introduce Test-Time Task adaptive merging (T3), a backpropagation-free framework that computes per-sample coefficients.<n>We present a rigorous cross-evaluation protocol spanning in-domain, base-to-novel, and corruptions across four modalities.
arXiv Detail & Related papers (2025-10-31T08:05:40Z)
Single-Step Reconstruction-Free Anomaly Detection and Segmentation via Diffusion Models [1.1487074612765584]
We introduce Reconstruction-free Anomaly Detection with Attention-based diffusion models in Real-time (RADAR)<n>RADAR overcomes the limitations of reconstruction-based anomaly detection.<n>We evaluate RADAR on real-world 3D-printed material and the MVTec-AD dataset.
arXiv Detail & Related papers (2025-08-06T18:56:08Z)
How Learnable Grids Recover Fine Detail in Low Dimensions: A Neural Tangent Kernel Analysis of Multigrid Parametric Encodings [106.3726679697804]
We compare the two most common techniques for mitigating this spectral bias: Fourier feature encodings (FFE) and multigrid parametric encodings (MPE)<n>MPEs are seen as the standard for low dimensional mappings, but MPEs often outperform them and learn representations with higher resolution and finer detail.<n>We prove that MPEs improve a network's performance through the structure of their grid and not their learnable embedding.
arXiv Detail & Related papers (2025-04-18T02:18:08Z)
FOF-X: Towards Real-time Detailed Human Reconstruction from a Single Image [64.96903230497755]
We introduce FOF-X for real-time reconstruction of detailed human geometry from a single image.<n>The core of FOF is to factorize a 3D occupancy field into a 2D vector field, retaining topology and spatial relationships within the 3D domain.<n>Based on FOF, we design a new reconstruction framework, FOF-X, to avoid the performance degradation caused by texture and lighting.
arXiv Detail & Related papers (2024-12-08T14:46:29Z)
DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection [55.48770333927732]
We propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection. It consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion's denoising network, and a feature-space pre-trained feature extractor. Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-12-11T18:38:28Z)
Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components. CNNs are used to augment the local texture information of coarse priors. DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.