HarmoniAD: Harmonizing Local Structures and Global Semantics for Anomaly Detection
- URL: http://arxiv.org/abs/2601.00327v1
- Date: Thu, 01 Jan 2026 12:45:45 GMT
- Title: HarmoniAD: Harmonizing Local Structures and Global Semantics for Anomaly Detection
- Authors: Naiqi Zhang, Chuancheng Shi, Jingtong Dou, Wenhua Wu, Fei Shen, Jianhua Cao,
- Abstract summary: Anomaly detection crucial in industrial product quality inspection.<n>Existing methods face a structure-semantics trade-off.<n>HarmoniAD is a frequency-guided dual-branch framework.
- Score: 4.679561335065019
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Anomaly detection is crucial in industrial product quality inspection. Failing to detect tiny defects often leads to serious consequences. Existing methods face a structure-semantics trade-off: structure-oriented models (such as frequency-based filters) are noise-sensitive, while semantics-oriented models (such as CLIP-based encoders) often miss fine details. To address this, we propose HarmoniAD, a frequency-guided dual-branch framework. Features are first extracted by the CLIP image encoder, then transformed into the frequency domain, and finally decoupled into high- and low-frequency paths for complementary modeling of structure and semantics. The high-frequency branch is equipped with a fine-grained structural attention module (FSAM) to enhance textures and edges for detecting small anomalies, while the low-frequency branch uses a global structural context module (GSCM) to capture long-range dependencies and preserve semantic consistency. Together, these branches balance fine detail and global semantics. HarmoniAD further adopts a multi-class joint training strategy, and experiments on MVTec-AD, VisA, and BTAD show state-of-the-art performance with both sensitivity and robustness.
Related papers
- Small Object Detection in Complex Backgrounds with Multi-Scale Attention and Global Relation Modeling [8.24377869183113]
Small object detection under complex backgrounds is a challenging task due to severe feature degradation, weak semantic representation, and inaccurate localization.<n>Existing detection frameworks are mainly designed for general objects.<n>We propose a multi-level feature enhancement and global relation modeling framework tailored for small object detection.
arXiv Detail & Related papers (2026-03-04T06:57:46Z) - SpectralMamba-UNet: Frequency-Disentangled State Space Modeling for Texture-Structure Consistent Medical Image Segmentation [14.42559964239819]
We propose SpectralMamba-UNet to decouple the learning of structural and textural information in the spectral domain.<n> Experiments on five public benchmarks demonstrate consistent improvements across diverse modalities and segmentation targets.
arXiv Detail & Related papers (2026-02-26T15:17:42Z) - Learning to Separate RF Signals Under Uncertainty: Detect-Then-Separate vs. Unified Joint Models [53.79667447811139]
We show that a single deep neural architecture learns to jointly detect and separate when applied directly to the received signal.<n>These findings highlight UJM as a scalable and practical alternative to DTS, while opening new directions for unified separation under broader estimation.
arXiv Detail & Related papers (2026-02-04T15:25:02Z) - SKANet: A Cognitive Dual-Stream Framework with Adaptive Modality Fusion for Robust Compound GNSS Interference Classification [47.20483076887704]
Global Navigation Satellite Systems (GNSS) face growing threats from sophisticated jamming interference.<n>We propose a cognitive deep learning framework built upon a dual-stream architecture that integrates Time-Frequency Images (TFIs) and Power Spectral Density (PSD)<n>We show that SKANet achieves an overall accuracy of 96.99%, exhibiting superior robustness for compound jamming classification.
arXiv Detail & Related papers (2026-01-19T07:42:45Z) - Explainable Transformer-CNN Fusion for Noise-Robust Speech Emotion Recognition [2.0391237204597363]
Speech Emotion Recognition systems often degrade in performance when exposed to unpredictable acoustic interference.<n>We propose a Hybrid Transformer-CNN framework that unifies the contextual modeling of Wav2Vec 2.0 with the spectral stability of 1D-Convolutional Neural Networks.
arXiv Detail & Related papers (2025-12-20T10:05:58Z) - 3-Tracer: A Tri-level Temporal-Aware Framework for Audio Forgery Detection and Localization [15.253944377996477]
T3-Tracer is a framework that jointly analyzes audio at the frame, segment, and audio levels to comprehensively detect forgery traces.<n>FA-FAM is designed to detect the authenticity of each audio frame. It combines both frame-level and audio-level temporal information to detect intra-frame forgery cues and global semantic inconsistencies.<n>It adopts a dual-branch architecture that jointly models frame features and inter-frame differences across multi-scale temporal windows, effectively identifying abrupt anomalies that appeared on the forged boundaries.
arXiv Detail & Related papers (2025-11-26T10:07:03Z) - WaveSeg: Enhancing Segmentation Precision via High-Frequency Prior and Mamba-Driven Spectrum Decomposition [61.3530659856013]
We propose a novel decoder architecture, WaveSeg, which jointly optimize feature refinement in spatial and wavelet domains.<n>High-frequency components are first learned from input images as explicit priors to reinforce boundary details.<n>Experiments on standard benchmarks demonstrate that WaveSeg, leveraging wavelet-domain frequency prior with Mamba-based attention, consistently outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2025-10-24T01:41:31Z) - Frequency-Domain Decomposition and Recomposition for Robust Audio-Visual Segmentation [60.9960601057956]
We introduce Frequency-Aware Audio-Visualcomposer (FAVS) framework consisting of two key modules.<n>FAVS framework achieves state-of-the-art performance on three benchmark datasets.
arXiv Detail & Related papers (2025-09-23T12:33:48Z) - Wavelet-Guided Dual-Frequency Encoding for Remote Sensing Change Detection [67.84730634802204]
Change detection in remote sensing imagery plays a vital role in various engineering applications, such as natural disaster monitoring, urban expansion tracking, and infrastructure management.<n>Most existing methods still rely on spatial-domain modeling, where the limited diversity of feature representations hinders the detection of subtle change regions.<n>We observe that frequency-domain feature modeling particularly in the wavelet domain amplify fine-grained differences in frequency components, enhancing the perception of edge changes that are challenging to capture in the spatial domain.
arXiv Detail & Related papers (2025-08-07T11:14:16Z) - Dual Semantic-Aware Network for Noise Suppressed Ultrasound Video Segmentation [21.117226880898418]
We propose a novel framework designed to enhance noise robustness in ultrasound video segmentation.<n>The Dual Semantic-Aware Network (DSANet) fosters mutual semantic awareness between local and global features.<n>Our model avoids pixel-level feature dependencies, it achieves significantly higher inference FPS than video-based methods, and even surpasses some image-based models.
arXiv Detail & Related papers (2025-07-10T05:41:17Z) - A Noise-Resilient Semi-Supervised Graph Autoencoder for Overlapping Semantic Community Detection [0.0]
Community detection in networks with overlapping structures remains a significant challenge.<n>We propose a semi-supervised graph autoencoder that combines graph multi-head attention and modularity to robustly detect overlapping communities.<n>Key innovations include a noise-resistant architecture and a semantic semi-supervised design optimized for community quality.
arXiv Detail & Related papers (2025-05-09T11:34:07Z) - FreSca: Scaling in Frequency Space Enhances Diffusion Models [55.75504192166779]
This paper explores frequency-based control within latent diffusion models.<n>We introduce FreSca, a novel framework that decomposes noise difference into low- and high-frequency components.<n>FreSca operates without any model retraining or architectural change, offering model- and task-agnostic control.
arXiv Detail & Related papers (2025-04-02T22:03:11Z) - DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection [52.74152717667157]
We propose a lightweight module called Dual Attention Module (DAM) for capturing cross-dimension interaction relationships in-temporal skeletal data.
It employs the frame attention mechanism to identify the most significant frames and the skeleton attention mechanism to capture broader relationships across fixed partitions with minimal parameters and flops.
arXiv Detail & Related papers (2024-06-05T06:18:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.