Related papers: LADMIM: Logical Anomaly Detection with Masked Image Modeling in Discrete Latent Space

LADMIM: Logical Anomaly Detection with Masked Image Modeling in Discrete Latent Space

URL: http://arxiv.org/abs/2410.10234v1
Date: Mon, 14 Oct 2024 07:50:56 GMT
Title: LADMIM: Logical Anomaly Detection with Masked Image Modeling in Discrete Latent Space
Authors: Shunsuke Sakai, Tatushito Hasegawa, Makoto Koshino,
Abstract summary: Masked image modeling is a self-supervised learning technique that predicts the feature representation of masked regions in an image. We propose a novel approach that leverages the characteristics of MIM to detect logical anomalies effectively. We evaluate the proposed method on the MVTecLOCO dataset, achieving an average AUC of 0.867.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Detecting anomalies such as incorrect combinations of objects or deviations in their positions is a challenging problem in industrial anomaly detection. Traditional methods mainly focus on local features of normal images, such as scratches and dirt, making detecting anomalies in the relationships between features difficult. Masked image modeling(MIM) is a self-supervised learning technique that predicts the feature representation of masked regions in an image. To reconstruct the masked regions, it is necessary to understand how the image is composed, allowing the learning of relationships between features within the image. We propose a novel approach that leverages the characteristics of MIM to detect logical anomalies effectively. To address blurriness in the reconstructed image, we replace pixel prediction with predicting the probability distribution of discrete latent variables of the masked regions using a tokenizer. We evaluated the proposed method on the MVTecLOCO dataset, achieving an average AUC of 0.867, surpassing traditional reconstruction-based and distillation-based methods.

Related papers

IterMask3D: Unsupervised Anomaly Detection and Segmentation with Test-Time Iterative Mask Refinement in 3D Brain MR [10.763588041592703]
Unsupervised anomaly detection and segmentation methods train a model to learn the training distribution as 'normal' prevailing methods corrupt the images and train a model to reconstruct them. We propose IterMask3D, an iterative spatial mask-refining strategy designed for 3D brain MRI.
arXiv Detail & Related papers (2025-04-07T10:41:23Z)
Effort: Efficient Orthogonal Modeling for Generalizable AI-Generated Image Detection [66.16595174895802]
Existing AI-generated image (AIGI) detection methods often suffer from limited generalization performance. In this paper, we identify a crucial yet previously overlooked asymmetry phenomenon in AIGI detection.
arXiv Detail & Related papers (2024-11-23T19:10:32Z)
DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection [55.48770333927732]
We propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection. It consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion's denoising network, and a feature-space pre-trained feature extractor. Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-12-11T18:38:28Z)
AnomalyDiffusion: Few-Shot Anomaly Image Generation with Diffusion Model [59.08735812631131]
Anomaly inspection plays an important role in industrial manufacture. Existing anomaly inspection methods are limited in their performance due to insufficient anomaly data. We propose AnomalyDiffusion, a novel diffusion-based few-shot anomaly generation model.
arXiv Detail & Related papers (2023-12-10T05:13:40Z)
ISSTAD: Incremental Self-Supervised Learning Based on Transformer for Anomaly Detection and Localization [12.975540251326683]
We introduce a novel approach based on the Transformer backbone network. We train a Masked Autoencoder (MAE) model solely on normal images. In the subsequent stage, we apply pixel-level data augmentation techniques to generate corrupted normal images. This process allows the model to learn how to repair corrupted regions and classify the status of each pixel.
arXiv Detail & Related papers (2023-03-30T13:11:26Z)
PNI : Industrial Anomaly Detection using Position and Neighborhood Information [6.316693022958221]
We propose a new algorithm, textbfPNI, which estimates the normal distribution using conditional probability given neighborhood features. We conducted experiments on the MVTec AD benchmark dataset and achieved state-of-the-art performance, with textbf99.56% and textbf98.98% AUROC scores in anomaly detection and localization.
arXiv Detail & Related papers (2022-11-22T23:45:27Z)
Self-Supervised Training with Autoencoders for Visual Anomaly Detection [61.62861063776813]
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold. We adapt a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples. We achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
arXiv Detail & Related papers (2022-06-23T14:16:30Z)
AnoViT: Unsupervised Anomaly Detection and Localization with Vision Transformer-based Encoder-Decoder [3.31490164885582]
We propose a vision transformer-based encoder-decoder model, named AnoViT, to reflect normal information by additionally learning the global relationship between image patches. The proposed model performed better than the convolution-based model on three benchmark datasets.
arXiv Detail & Related papers (2022-03-21T09:01:37Z)
Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection [97.93062818228015]
We propose to integrate the reconstruction-based functionality into a novel self-supervised predictive architectural building block. Our block is equipped with a loss that minimizes the reconstruction error with respect to the masked area in the receptive field. We demonstrate the generality of our block by integrating it into several state-of-the-art frameworks for anomaly detection on image and video.
arXiv Detail & Related papers (2021-11-17T13:30:31Z)
A Hierarchical Transformation-Discriminating Generative Model for Few Shot Anomaly Detection [93.38607559281601]
We devise a hierarchical generative model that captures the multi-scale patch distribution of each training image. The anomaly score is obtained by aggregating the patch-based votes of the correct transformation across scales and image regions.
arXiv Detail & Related papers (2021-04-29T17:49:48Z)
Anomaly localization by modeling perceptual features [3.04585143845864]
Feature-Augmented VAE is trained by reconstructing the input image in pixel space, and also in several different feature spaces. It achieves clear improvement over state-of-the-art methods on the MVTec anomaly detection and localization datasets.
arXiv Detail & Related papers (2020-08-12T15:09:13Z)
Improved Slice-wise Tumour Detection in Brain MRIs by Computing Dissimilarities between Latent Representations [68.8204255655161]
Anomaly detection for Magnetic Resonance Images (MRIs) can be solved with unsupervised methods. We have proposed a slice-wise semi-supervised method for tumour detection based on the computation of a dissimilarity function in the latent space of a Variational AutoEncoder. We show that by training the models on higher resolution images and by improving the quality of the reconstructions, we obtain results which are comparable with different baselines.
arXiv Detail & Related papers (2020-07-24T14:02:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.