Learning Multi-view Multi-class Anomaly Detection
- URL: http://arxiv.org/abs/2504.21294v1
- Date: Wed, 30 Apr 2025 03:59:58 GMT
- Title: Learning Multi-view Multi-class Anomaly Detection
- Authors: Qianzi Yu, Yang Cao, Yu Kang,
- Abstract summary: We introduce a Multi-View Multi-Class Anomaly Detection model (MVMCAD), which integrates information from multiple views to accurately identify anomalies.<n>Specifically, we propose a semi-frozen encoder, where a pre-encoder prior enhancement mechanism is added before the frozen encoder.<n>An Anomaly Amplification Module (AAM) that models global token interactions and suppresses normal regions, and a Cross-Feature Loss that aligns shallow encoder features with deep decoder features.
- Score: 10.199404082194947
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The latest trend in anomaly detection is to train a unified model instead of training a separate model for each category. However, existing multi-class anomaly detection (MCAD) models perform poorly in multi-view scenarios because they often fail to effectively model the relationships and complementary information among different views. In this paper, we introduce a Multi-View Multi-Class Anomaly Detection model (MVMCAD), which integrates information from multiple views to accurately identify anomalies. Specifically, we propose a semi-frozen encoder, where a pre-encoder prior enhancement mechanism is added before the frozen encoder, enabling stable cross-view feature modeling and efficient adaptation for improved anomaly detection. Furthermore, we propose an Anomaly Amplification Module (AAM) that models global token interactions and suppresses normal regions to enhance anomaly signals, leading to improved detection performance in multi-view settings. Finally, we propose a Cross-Feature Loss that aligns shallow encoder features with deep decoder features and vice versa, enhancing the model's sensitivity to anomalies at different semantic levels under multi-view scenarios. Extensive experiments on the Real-IAD dataset for multi-view multi-class anomaly detection validate the effectiveness of our approach, achieving state-of-the-art performance of 91.0/88.6/82.1 and 99.1/43.9/48.2/95.2 for image-level and the pixel-level, respectively.
Related papers
- Multi-View Industrial Anomaly Detection with Epipolar Constrained Cross-View Fusion [15.819291772583393]
We introduce an epipolar geometry-constrained attention module to guide cross-view fusion.<n>To further enhance the potential of cross-view attention, we propose a pretraining strategy inspired by memory bank-based anomaly detection.<n>We demonstrate that our framework outperforms existing methods on the state-of-the-art multi-view anomaly detection dataset.
arXiv Detail & Related papers (2025-03-14T05:02:54Z) - Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation [61.64052577026623]
Real-world multi-view datasets are often heterogeneous and imperfect.<n>We propose a novel robust MVL method (namely RML) with simultaneous representation fusion and alignment.<n>In experiments, we employ it in unsupervised multi-view clustering, noise-label classification, and as a plug-and-play module for cross-modal hashing retrieval.
arXiv Detail & Related papers (2025-03-06T07:01:08Z) - Toward Multi-class Anomaly Detection: Exploring Class-aware Unified Model against Inter-class Interference [67.36605226797887]
We introduce a Multi-class Implicit Neural representation Transformer for unified Anomaly Detection (MINT-AD)
By learning the multi-class distributions, the model generates class-aware query embeddings for the transformer decoder.
MINT-AD can project category and position information into a feature embedding space, further supervised by classification and prior probability loss functions.
arXiv Detail & Related papers (2024-03-21T08:08:31Z) - Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - Open-Vocabulary Video Anomaly Detection [57.552523669351636]
Video anomaly detection (VAD) with weak supervision has achieved remarkable performance in utilizing video-level labels to discriminate whether a video frame is normal or abnormal.
Recent studies attempt to tackle a more realistic setting, open-set VAD, which aims to detect unseen anomalies given seen anomalies and normal videos.
This paper takes a step further and explores open-vocabulary video anomaly detection (OVVAD), in which we aim to leverage pre-trained large models to detect and categorize seen and unseen anomalies.
arXiv Detail & Related papers (2023-11-13T02:54:17Z) - mixed attention auto encoder for multi-class industrial anomaly
detection [2.8519768339207356]
We propose a unified mixed-attention auto encoder (MAAE) to implement multi-class anomaly detection with a single model.
To alleviate the performance degradation due to the diverse distribution patterns of different categories, we employ spatial attentions and channel attentions.
MAAE delivers remarkable performances on the benchmark dataset compared with the state-of-the-art methods.
arXiv Detail & Related papers (2023-09-22T08:17:48Z) - LafitE: Latent Diffusion Model with Feature Editing for Unsupervised
Multi-class Anomaly Detection [12.596635603629725]
We develop a unified model to detect anomalies from objects belonging to multiple classes when only normal data is accessible.
We first explore the generative-based approach and investigate latent diffusion models for reconstruction.
We introduce a feature editing strategy that modifies the input feature space of the diffusion model to further alleviate identity shortcuts''
arXiv Detail & Related papers (2023-07-16T14:41:22Z) - Diversity-Measurable Anomaly Detection [106.07413438216416]
We propose Diversity-Measurable Anomaly Detection (DMAD) framework to enhance reconstruction diversity.
PDM essentially decouples deformation from embedding and makes the final anomaly score more reliable.
arXiv Detail & Related papers (2023-03-09T05:52:42Z) - Siamese Transition Masked Autoencoders as Uniform Unsupervised Visual
Anomaly Detector [4.33060257697635]
This paper proposes a novel framework termed Siamese Transition Masked Autoencoders(ST-MAE) to handle various visual anomaly detection tasks uniformly.
Our deep feature transition scheme yields a nonsupervised and semantic self-supervisory task to extract normal patterns.
arXiv Detail & Related papers (2022-11-01T09:45:49Z) - Meta-learning One-class Classifiers with Eigenvalue Solvers for
Supervised Anomaly Detection [55.888835686183995]
We propose a neural network-based meta-learning method for supervised anomaly detection.
We experimentally demonstrate that the proposed method achieves better performance than existing anomaly detection and few-shot learning methods.
arXiv Detail & Related papers (2021-03-01T01:43:04Z) - Improving unsupervised anomaly localization by applying multi-scale
memories to autoencoders [14.075973859711567]
MMAE.MMAE updates slots at corresponding resolution scale as prototype features during unsupervised learning.
For anomaly detection, we accomplish anomaly removal by replacing the original encoded image features at each scale with most relevant prototype features.
Experimental results on various datasets testify that our MMAE successfully removes anomalies at different scales and performs favorably on several datasets.
arXiv Detail & Related papers (2020-12-21T04:44:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.