Hierarchical Vector Quantized Transformer for Multi-class Unsupervised
Anomaly Detection
- URL: http://arxiv.org/abs/2310.14228v1
- Date: Sun, 22 Oct 2023 08:20:33 GMT
- Title: Hierarchical Vector Quantized Transformer for Multi-class Unsupervised
Anomaly Detection
- Authors: Ruiying Lu, YuJie Wu, Long Tian, Dongsheng Wang, Bo Chen, Xiyang Liu,
Ruimin Hu
- Abstract summary: Unsupervised image Anomaly Detection (UAD) aims to learn robust and discriminative representations of normal samples.
This paper focuses on building a unified framework for multiple classes.
- Score: 24.11900895337062
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unsupervised image Anomaly Detection (UAD) aims to learn robust and
discriminative representations of normal samples. While separate solutions per
class endow expensive computation and limited generalizability, this paper
focuses on building a unified framework for multiple classes. Under such a
challenging setting, popular reconstruction-based networks with continuous
latent representation assumption always suffer from the "identical shortcut"
issue, where both normal and abnormal samples can be well recovered and
difficult to distinguish. To address this pivotal issue, we propose a
hierarchical vector quantized prototype-oriented Transformer under a
probabilistic framework. First, instead of learning the continuous
representations, we preserve the typical normal patterns as discrete iconic
prototypes, and confirm the importance of Vector Quantization in preventing the
model from falling into the shortcut. The vector quantized iconic prototype is
integrated into the Transformer for reconstruction, such that the abnormal data
point is flipped to a normal data point.Second, we investigate an exquisite
hierarchical framework to relieve the codebook collapse issue and replenish
frail normal patterns. Third, a prototype-oriented optimal transport method is
proposed to better regulate the prototypes and hierarchically evaluate the
abnormal score. By evaluating on MVTec-AD and VisA datasets, our model
surpasses the state-of-the-art alternatives and possesses good
interpretability. The code is available at
https://github.com/RuiyingLu/HVQ-Trans.
Related papers
- Prior Normality Prompt Transformer for Multi-class Industrial Image Anomaly Detection [6.865429486202104]
We introduce Prior Normality Prompt Transformer (PNPT) for multi-class anomaly detection.
PNPT strategically incorporates normal semantics prompting to mitigate the "identical mapping" problem.
This entails integrating a prior normality prompt into the reconstruction process, yielding a dual-stream model.
arXiv Detail & Related papers (2024-06-17T13:10:04Z) - MLAD: A Unified Model for Multi-system Log Anomaly Detection [35.68387377240593]
We propose MLAD, a novel anomaly detection model that incorporates semantic relational reasoning across multiple systems.
Specifically, we employ Sentence-bert to capture the similarities between log sequences and convert them into highly-dimensional learnable semantic vectors.
We revamp the formulas of the Attention layer to discern the significance of each keyword in the sequence and model the overall distribution of the multi-system dataset.
arXiv Detail & Related papers (2024-01-15T12:51:13Z) - Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection [54.041049052843604]
We present STEMD, a novel end-to-end framework that enhances the DETR-like paradigm for multi-frame 3D object detection.
First, to model the inter-object spatial interaction and complex temporal dependencies, we introduce the spatial-temporal graph attention network.
Finally, it poses a challenge for the network to distinguish between the positive query and other highly similar queries that are not the best match.
arXiv Detail & Related papers (2023-07-01T13:53:14Z) - Transformers meet Stochastic Block Models: Attention with Data-Adaptive
Sparsity and Cost [53.746169882193456]
Recent works have proposed various sparse attention modules to overcome the quadratic cost of self-attention.
We propose a model that resolves both problems by endowing each attention head with a mixed-membership Block Model.
Our model outperforms previous efficient variants as well as the original Transformer with full attention.
arXiv Detail & Related papers (2022-10-27T15:30:52Z) - Dynamic Prototype Mask for Occluded Person Re-Identification [88.7782299372656]
Existing methods mainly address this issue by employing body clues provided by an extra network to distinguish the visible part.
We propose a novel Dynamic Prototype Mask (DPM) based on two self-evident prior knowledge.
Under this condition, the occluded representation could be well aligned in a selected subspace spontaneously.
arXiv Detail & Related papers (2022-07-19T03:31:13Z) - Self-Supervised Training with Autoencoders for Visual Anomaly Detection [61.62861063776813]
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold.
We adapt a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples.
We achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
arXiv Detail & Related papers (2022-06-23T14:16:30Z) - Rethinking Semantic Segmentation: A Prototype View [126.59244185849838]
We present a nonparametric semantic segmentation model based on non-learnable prototypes.
Our framework yields compelling results over several datasets.
We expect this work will provoke a rethink of the current de facto semantic segmentation model design.
arXiv Detail & Related papers (2022-03-28T21:15:32Z) - Entropy optimized semi-supervised decomposed vector-quantized
variational autoencoder model based on transfer learning for multiclass text
classification and generation [3.9318191265352196]
We propose a semisupervised discrete latent variable model for multi-class text classification and text generation.
The proposed model employs the concept of transfer learning for training a quantized transformer model.
Experimental results indicate that the proposed model has surpassed the state-of-the-art models remarkably.
arXiv Detail & Related papers (2021-11-10T07:07:54Z) - A Closer Look at Prototype Classifier for Few-shot Image Classification [28.821731837776593]
We show that a prototype classifier works equally well without fine-tuning and meta-learning.
We derive a novel generalization bound for the prototypical network and show that focusing on the variance of the norm of a feature vector can improve performance.
arXiv Detail & Related papers (2021-10-11T08:28:43Z) - Unsupervised Anomaly Detection with Adversarial Mirrored AutoEncoders [51.691585766702744]
We propose a variant of Adversarial Autoencoder which uses a mirrored Wasserstein loss in the discriminator to enforce better semantic-level reconstruction.
We put forward an alternative measure of anomaly score to replace the reconstruction-based metric.
Our method outperforms the current state-of-the-art methods for anomaly detection on several OOD detection benchmarks.
arXiv Detail & Related papers (2020-03-24T08:26:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.