AnoViT: Unsupervised Anomaly Detection and Localization with Vision
Transformer-based Encoder-Decoder
- URL: http://arxiv.org/abs/2203.10808v1
- Date: Mon, 21 Mar 2022 09:01:37 GMT
- Title: AnoViT: Unsupervised Anomaly Detection and Localization with Vision
Transformer-based Encoder-Decoder
- Authors: Yunseung Lee, Pilsung Kang
- Abstract summary: We propose a vision transformer-based encoder-decoder model, named AnoViT, to reflect normal information by additionally learning the global relationship between image patches.
The proposed model performed better than the convolution-based model on three benchmark datasets.
- Score: 3.31490164885582
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image anomaly detection problems aim to determine whether an image is
abnormal, and to detect anomalous areas. These methods are actively used in
various fields such as manufacturing, medical care, and intelligent
information. Encoder-decoder structures have been widely used in the field of
anomaly detection because they can easily learn normal patterns in an
unsupervised learning environment and calculate a score to identify
abnormalities through a reconstruction error indicating the difference between
input and reconstructed images. Therefore, current image anomaly detection
methods have commonly used convolutional encoder-decoders to extract normal
information through the local features of images. However, they are limited in
that only local features of the image can be utilized when constructing a
normal representation owing to the characteristics of convolution operations
using a filter of fixed size. Therefore, we propose a vision transformer-based
encoder-decoder model, named AnoViT, designed to reflect normal information by
additionally learning the global relationship between image patches, which is
capable of both image anomaly detection and localization. The proposed approach
constructs a feature map that maintains the existing location information of
individual patches by using the embeddings of all patches passed through
multiple self-attention layers. The proposed AnoViT model performed better than
the convolution-based model on three benchmark datasets. In MVTecAD, which is a
representative benchmark dataset for anomaly localization, it showed improved
results on 10 out of 15 classes compared with the baseline. Furthermore, the
proposed method showed good performance regardless of the class and type of the
anomalous area when localization results were evaluated qualitatively.
Related papers
- GeneralAD: Anomaly Detection Across Domains by Attending to Distorted Features [68.14842693208465]
GeneralAD is an anomaly detection framework designed to operate in semantic, near-distribution, and industrial settings.
We propose a novel self-supervised anomaly generation module that employs straightforward operations like noise addition and shuffling to patch features.
We extensively evaluated our approach on ten datasets, achieving state-of-the-art results in six and on-par performance in the remaining.
arXiv Detail & Related papers (2024-07-17T09:27:41Z) - A Hierarchically Feature Reconstructed Autoencoder for Unsupervised Anomaly Detection [8.512184778338806]
It consists of a well pre-trained encoder to extract hierarchical feature representations and a decoder to reconstruct these intermediate features from the encoder.
The anomalies can be detected when the decoder fails to reconstruct features well, and then errors of hierarchical feature reconstruction are aggregated into an anomaly map to achieve anomaly localization.
Experiment results show that the proposed method outperforms the state-of-the-art methods on MNIST, Fashion-MNIST, CIFAR-10, and MVTec Anomaly Detection datasets.
arXiv Detail & Related papers (2024-05-15T07:20:27Z) - DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection [55.48770333927732]
We propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection.
It consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion's denoising network, and a feature-space pre-trained feature extractor.
Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-12-11T18:38:28Z) - Video Anomaly Detection via Spatio-Temporal Pseudo-Anomaly Generation : A Unified Approach [49.995833831087175]
This work proposes a novel method for generating generic Video-temporal PAs by inpainting a masked out region of an image.
In addition, we present a simple unified framework to detect real-world anomalies under the OCC setting.
Our method performs on par with other existing state-of-the-art PAs generation and reconstruction based methods under the OCC setting.
arXiv Detail & Related papers (2023-11-27T13:14:06Z) - Weakly-supervised deepfake localization in diffusion-generated images [4.548755617115687]
We propose a weakly-supervised localization problem based on the Xception network as the backbone architecture.
We show that the best performing detection method (based on local scores) is less sensitive to the looser supervision than to the mismatch in terms of dataset or generator.
arXiv Detail & Related papers (2023-11-08T10:27:36Z) - CRADL: Contrastive Representations for Unsupervised Anomaly Detection
and Localization [2.8659934481869715]
Unsupervised anomaly detection in medical imaging aims to detect and localize arbitrary anomalies without requiring anomalous data during training.
Most current state-of-the-art methods use latent variable generative models operating directly on the images.
We propose CRADL whose core idea is to model the distribution of normal samples directly in the low-dimensional representation space of an encoder trained with a contrastive pretext-task.
arXiv Detail & Related papers (2023-01-05T16:07:49Z) - Self-Supervised Training with Autoencoders for Visual Anomaly Detection [61.62861063776813]
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold.
We adapt a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples.
We achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
arXiv Detail & Related papers (2022-06-23T14:16:30Z) - Self-Supervised Predictive Convolutional Attentive Block for Anomaly
Detection [97.93062818228015]
We propose to integrate the reconstruction-based functionality into a novel self-supervised predictive architectural building block.
Our block is equipped with a loss that minimizes the reconstruction error with respect to the masked area in the receptive field.
We demonstrate the generality of our block by integrating it into several state-of-the-art frameworks for anomaly detection on image and video.
arXiv Detail & Related papers (2021-11-17T13:30:31Z) - Inpainting Transformer for Anomaly Detection [0.0]
Inpainting Transformer (InTra) is trained to inpaint covered patches in a large sequence of image patches.
InTra achieves better than state-of-the-art results on the MVTec AD dataset for detection and localization.
arXiv Detail & Related papers (2021-04-28T17:27:44Z) - CutPaste: Self-Supervised Learning for Anomaly Detection and
Localization [59.719925639875036]
We propose a framework for building anomaly detectors using normal training data only.
We first learn self-supervised deep representations and then build a generative one-class classifier on learned representations.
Our empirical study on MVTec anomaly detection dataset demonstrates the proposed algorithm is general to be able to detect various types of real-world defects.
arXiv Detail & Related papers (2021-04-08T19:04:55Z) - Iterative energy-based projection on a normal data manifold for anomaly
localization [3.785123406103385]
We propose a new approach for projecting anomalous data on a autoencoder-learned normal data manifold.
By iteratively updating the input of the autoencoder, we bypass the loss of high-frequency information caused by the autoencoder bottleneck.
arXiv Detail & Related papers (2020-02-10T13:35:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.