Related papers: AnoViT: Unsupervised Anomaly Detection and Localization with Vision Transformer-based Encoder-Decoder

AnoViT: Unsupervised Anomaly Detection and Localization with Vision Transformer-based Encoder-Decoder

URL: http://arxiv.org/abs/2203.10808v1
Date: Mon, 21 Mar 2022 09:01:37 GMT
Title: AnoViT: Unsupervised Anomaly Detection and Localization with Vision Transformer-based Encoder-Decoder
Authors: Yunseung Lee, Pilsung Kang
Abstract summary: We propose a vision transformer-based encoder-decoder model, named AnoViT, to reflect normal information by additionally learning the global relationship between image patches. The proposed model performed better than the convolution-based model on three benchmark datasets.
Score: 3.31490164885582
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Image anomaly detection problems aim to determine whether an image is abnormal, and to detect anomalous areas. These methods are actively used in various fields such as manufacturing, medical care, and intelligent information. Encoder-decoder structures have been widely used in the field of anomaly detection because they can easily learn normal patterns in an unsupervised learning environment and calculate a score to identify abnormalities through a reconstruction error indicating the difference between input and reconstructed images. Therefore, current image anomaly detection methods have commonly used convolutional encoder-decoders to extract normal information through the local features of images. However, they are limited in that only local features of the image can be utilized when constructing a normal representation owing to the characteristics of convolution operations using a filter of fixed size. Therefore, we propose a vision transformer-based encoder-decoder model, named AnoViT, designed to reflect normal information by additionally learning the global relationship between image patches, which is capable of both image anomaly detection and localization. The proposed approach constructs a feature map that maintains the existing location information of individual patches by using the embeddings of all patches passed through multiple self-attention layers. The proposed AnoViT model performed better than the convolution-based model on three benchmark datasets. In MVTecAD, which is a representative benchmark dataset for anomaly localization, it showed improved results on 10 out of 15 classes compared with the baseline. Furthermore, the proposed method showed good performance regardless of the class and type of the anomalous area when localization results were evaluated qualitatively.

Related papers

Crane: Context-Guided Prompt Learning and Attention Refinement for Zero-Shot Anomaly Detections [50.343419243749054]
Anomaly Detection (AD) involves identifying deviations from normal data distributions. We propose a novel approach that conditions the prompts of the text encoder based on image context extracted from the vision encoder. Our method achieves state-of-the-art performance, improving performance by 2% to 29% across different metrics on 14 datasets.
arXiv Detail & Related papers (2025-04-15T10:42:25Z)
Boosting Fine-Grained Visual Anomaly Detection with Coarse-Knowledge-Aware Adversarial Learning [16.296864509584346]
In this paper, a coarse-knowledge-aware adversarial learning method is developed to align the distribution of reconstructed features with that of normal features. The alignment can effectively suppress the auto-encoder's reconstruction ability on anomalies and thus improve the detection accuracy. Although no patch-level anomalous information is available, we rigorously prove that the proposed knowledge-aware method can also align the distribution of reconstructed patch features with the normal ones.
arXiv Detail & Related papers (2024-12-17T12:24:08Z)
Towards Zero-shot 3D Anomaly Localization [58.62650061201283]
3DzAL is a novel patch-level contrastive learning framework for 3D anomaly detection and localization. We show that 3DzAL outperforms the state-of-the-art anomaly detection and localization performance.
arXiv Detail & Related papers (2024-12-05T16:25:27Z)
GeneralAD: Anomaly Detection Across Domains by Attending to Distorted Features [68.14842693208465]
GeneralAD is an anomaly detection framework designed to operate in semantic, near-distribution, and industrial settings. We propose a novel self-supervised anomaly generation module that employs straightforward operations like noise addition and shuffling to patch features. We extensively evaluated our approach on ten datasets, achieving state-of-the-art results in six and on-par performance in the remaining.
arXiv Detail & Related papers (2024-07-17T09:27:41Z)
A Hierarchically Feature Reconstructed Autoencoder for Unsupervised Anomaly Detection [8.512184778338806]
It consists of a well pre-trained encoder to extract hierarchical feature representations and a decoder to reconstruct these intermediate features from the encoder. The anomalies can be detected when the decoder fails to reconstruct features well, and then errors of hierarchical feature reconstruction are aggregated into an anomaly map to achieve anomaly localization. Experiment results show that the proposed method outperforms the state-of-the-art methods on MNIST, Fashion-MNIST, CIFAR-10, and MVTec Anomaly Detection datasets.
arXiv Detail & Related papers (2024-05-15T07:20:27Z)
DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection [55.48770333927732]
We propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection. It consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion's denoising network, and a feature-space pre-trained feature extractor. Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-12-11T18:38:28Z)
Video Anomaly Detection via Spatio-Temporal Pseudo-Anomaly Generation : A Unified Approach [49.995833831087175]
This work proposes a novel method for generating generic Video-temporal PAs by inpainting a masked out region of an image. In addition, we present a simple unified framework to detect real-world anomalies under the OCC setting. Our method performs on par with other existing state-of-the-art PAs generation and reconstruction based methods under the OCC setting.
arXiv Detail & Related papers (2023-11-27T13:14:06Z)
Weakly-supervised deepfake localization in diffusion-generated images [4.548755617115687]
We propose a weakly-supervised localization problem based on the Xception network as the backbone architecture. We show that the best performing detection method (based on local scores) is less sensitive to the looser supervision than to the mismatch in terms of dataset or generator.
arXiv Detail & Related papers (2023-11-08T10:27:36Z)
CRADL: Contrastive Representations for Unsupervised Anomaly Detection and Localization [2.8659934481869715]
Unsupervised anomaly detection in medical imaging aims to detect and localize arbitrary anomalies without requiring anomalous data during training. Most current state-of-the-art methods use latent variable generative models operating directly on the images. We propose CRADL whose core idea is to model the distribution of normal samples directly in the low-dimensional representation space of an encoder trained with a contrastive pretext-task.
arXiv Detail & Related papers (2023-01-05T16:07:49Z)
Self-Supervised Training with Autoencoders for Visual Anomaly Detection [61.62861063776813]
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold. We adapt a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples. We achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
arXiv Detail & Related papers (2022-06-23T14:16:30Z)
Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection [97.93062818228015]
We propose to integrate the reconstruction-based functionality into a novel self-supervised predictive architectural building block. Our block is equipped with a loss that minimizes the reconstruction error with respect to the masked area in the receptive field. We demonstrate the generality of our block by integrating it into several state-of-the-art frameworks for anomaly detection on image and video.
arXiv Detail & Related papers (2021-11-17T13:30:31Z)
Inpainting Transformer for Anomaly Detection [0.0]
Inpainting Transformer (InTra) is trained to inpaint covered patches in a large sequence of image patches. InTra achieves better than state-of-the-art results on the MVTec AD dataset for detection and localization.
arXiv Detail & Related papers (2021-04-28T17:27:44Z)
CutPaste: Self-Supervised Learning for Anomaly Detection and Localization [59.719925639875036]
We propose a framework for building anomaly detectors using normal training data only. We first learn self-supervised deep representations and then build a generative one-class classifier on learned representations. Our empirical study on MVTec anomaly detection dataset demonstrates the proposed algorithm is general to be able to detect various types of real-world defects.
arXiv Detail & Related papers (2021-04-08T19:04:55Z)
Iterative energy-based projection on a normal data manifold for anomaly localization [3.785123406103385]
We propose a new approach for projecting anomalous data on a autoencoder-learned normal data manifold. By iteratively updating the input of the autoencoder, we bypass the loss of high-frequency information caused by the autoencoder bottleneck.
arXiv Detail & Related papers (2020-02-10T13:35:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.