Masked Transformer for image Anomaly Localization
- URL: http://arxiv.org/abs/2210.15540v1
- Date: Thu, 27 Oct 2022 15:30:48 GMT
- Title: Masked Transformer for image Anomaly Localization
- Authors: Axel De Nardin, Pankaj Mishra, Gian Luca Foresti, Claudio Piciarelli
- Abstract summary: We propose a new model for image anomaly detection based on Vision Transformer architecture with patch masking.
We show that multi-resolution patches and their collective embeddings provide a large improvement in the model's performance.
The proposed model has been tested on popular anomaly detection datasets such as MVTec and head CT.
- Score: 14.455765147827345
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image anomaly detection consists in detecting images or image portions that
are visually different from the majority of the samples in a dataset. The task
is of practical importance for various real-life applications like biomedical
image analysis, visual inspection in industrial production, banking, traffic
management, etc. Most of the current deep learning approaches rely on image
reconstruction: the input image is projected in some latent space and then
reconstructed, assuming that the network (mostly trained on normal data) will
not be able to reconstruct the anomalous portions. However, this assumption
does not always hold. We thus propose a new model based on the Vision
Transformer architecture with patch masking: the input image is split in
several patches, and each patch is reconstructed only from the surrounding
data, thus ignoring the potentially anomalous information contained in the
patch itself. We then show that multi-resolution patches and their collective
embeddings provide a large improvement in the model's performance compared to
the exclusive use of the traditional square patches. The proposed model has
been tested on popular anomaly detection datasets such as MVTec and head CT and
achieved good results when compared to other state-of-the-art approaches.
Related papers
- DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection [55.48770333927732]
We propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection.
It consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion's denoising network, and a feature-space pre-trained feature extractor.
Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-12-11T18:38:28Z) - PatchNR: Learning from Small Data by Patch Normalizing Flow
Regularization [57.37911115888587]
We introduce a regularizer for the variational modeling of inverse problems in imaging based on normalizing flows.
Our regularizer, called patchNR, involves a normalizing flow learned on patches of very few images.
arXiv Detail & Related papers (2022-05-24T12:14:26Z) - AnoViT: Unsupervised Anomaly Detection and Localization with Vision
Transformer-based Encoder-Decoder [3.31490164885582]
We propose a vision transformer-based encoder-decoder model, named AnoViT, to reflect normal information by additionally learning the global relationship between image patches.
The proposed model performed better than the convolution-based model on three benchmark datasets.
arXiv Detail & Related papers (2022-03-21T09:01:37Z) - HIPA: Hierarchical Patch Transformer for Single Image Super Resolution [62.7081074931892]
This paper presents HIPA, a novel Transformer architecture that progressively recovers the high resolution image using a hierarchical patch partition.
We build a cascaded model that processes an input image in multiple stages, where we start with tokens with small patch sizes and gradually merge to the full resolution.
Such a hierarchical patch mechanism not only explicitly enables feature aggregation at multiple resolutions but also adaptively learns patch-aware features for different image regions.
arXiv Detail & Related papers (2022-03-19T05:09:34Z) - Self-Supervised Predictive Convolutional Attentive Block for Anomaly
Detection [97.93062818228015]
We propose to integrate the reconstruction-based functionality into a novel self-supervised predictive architectural building block.
Our block is equipped with a loss that minimizes the reconstruction error with respect to the masked area in the receptive field.
We demonstrate the generality of our block by integrating it into several state-of-the-art frameworks for anomaly detection on image and video.
arXiv Detail & Related papers (2021-11-17T13:30:31Z) - A Hierarchical Transformation-Discriminating Generative Model for Few
Shot Anomaly Detection [93.38607559281601]
We devise a hierarchical generative model that captures the multi-scale patch distribution of each training image.
The anomaly score is obtained by aggregating the patch-based votes of the correct transformation across scales and image regions.
arXiv Detail & Related papers (2021-04-29T17:49:48Z) - Inpainting Transformer for Anomaly Detection [0.0]
Inpainting Transformer (InTra) is trained to inpaint covered patches in a large sequence of image patches.
InTra achieves better than state-of-the-art results on the MVTec AD dataset for detection and localization.
arXiv Detail & Related papers (2021-04-28T17:27:44Z) - CutPaste: Self-Supervised Learning for Anomaly Detection and
Localization [59.719925639875036]
We propose a framework for building anomaly detectors using normal training data only.
We first learn self-supervised deep representations and then build a generative one-class classifier on learned representations.
Our empirical study on MVTec anomaly detection dataset demonstrates the proposed algorithm is general to be able to detect various types of real-world defects.
arXiv Detail & Related papers (2021-04-08T19:04:55Z) - Image Anomaly Detection by Aggregating Deep Pyramidal Representations [16.246831343527052]
Anomaly detection consists in identifying, within a dataset, those samples that significantly differ from the majority of the data.
This paper focuses on image anomaly detection using a deep neural network with multiple pyramid levels to analyze the image features at different scales.
arXiv Detail & Related papers (2020-11-12T09:58:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.