Related papers: Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection

Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection

URL: http://arxiv.org/abs/2405.14325v2
Date: Wed, 29 May 2024 08:57:31 GMT
Title: Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection
Authors: Jia Guo, Shuai Lu, Weihang Zhang, Huiqi Li,
Abstract summary: In this paper, we introduce a minimalistic reconstruction-based anomaly detection framework, namely Dinomaly. Our proposed Dinomaly achieves impressive image AUROC of 99.6%, 98.7%, and 89.3% on three datasets respectively.
Score: 29.370142078092375
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Recent studies highlighted a practical setting of unsupervised anomaly detection (UAD) that builds a unified model for multi-class images, serving as an alternative to the conventional one-class-one-model setup. Despite various advancements addressing this challenging task, the detection performance under the multi-class setting still lags far behind state-of-the-art class-separated models. Our research aims to bridge this substantial performance gap. In this paper, we introduce a minimalistic reconstruction-based anomaly detection framework, namely Dinomaly, which leverages pure Transformer architectures without relying on complex designs, additional modules, or specialized tricks. Given this powerful framework consisted of only Attentions and MLPs, we found four simple components that are essential to multi-class anomaly detection: (1) Foundation Transformers that extracts universal and discriminative features, (2) Noisy Bottleneck where pre-existing Dropouts do all the noise injection tricks, (3) Linear Attention that naturally cannot focus, and (4) Loose Reconstruction that does not force layer-to-layer and point-by-point reconstruction. Extensive experiments are conducted across three popular anomaly detection benchmarks including MVTec-AD, VisA, and the recently released Real-IAD. Our proposed Dinomaly achieves impressive image AUROC of 99.6%, 98.7%, and 89.3% on the three datasets respectively, which is not only superior to state-of-the-art multi-class UAD methods, but also surpasses the most advanced class-separated UAD records.

Related papers

Search is All You Need for Few-shot Anomaly Detection [39.737510049667556]
Few-shot anomaly detection (FSAD) has emerged as a crucial yet challenging task in industrial inspection. We show that a straightforward nearest-neighbor search framework can surpass state-of-the-art performance in both single-class and multi-class FSAD scenarios. Our method achieves remarkable image-level AUROC scores of 97.4%, 94.8%, and 70.8% respectively.
arXiv Detail & Related papers (2025-04-16T09:21:34Z)
Rethinking Multi-modal Object Detection from the Perspective of Mono-Modality Feature Learning [18.268054258939213]
We introduce linear probing evaluation to the multi-modal detectors and rethink the multi-modal object detection task. We construct an novel framework called M$2$D-LIF, which consists of the Mono-Modality Distillation (M$2$D) method and the Local Illumination-aware Fusion (LIF) module. Our M$2$D-LIF effectively mitigates the Fusion Degradation phenomenon and outperforms the previous SOTA detectors.
arXiv Detail & Related papers (2025-03-14T18:15:53Z)
Comprehensive Multi-Modal Prototypes are Simple and Effective Classifiers for Vast-Vocabulary Object Detection [68.26282316080558]
Current open-world detectors can recognize a broader range of vocabularies, despite being trained on limited categories. We introduce Prova, a prototype classifier for vast-vocabulary object detection.
arXiv Detail & Related papers (2024-12-23T18:57:43Z)
Unleashing the Power of Generic Segmentation Models: A Simple Baseline for Infrared Small Target Detection [57.666055329221194]
We investigate the adaptation of generic segmentation models, such as the Segment Anything Model (SAM), to infrared small object detection tasks. Our model demonstrates significantly improved performance in both accuracy and throughput compared to existing approaches.
arXiv Detail & Related papers (2024-09-07T05:31:24Z)
AnomalyDINO: Boosting Patch-based Few-shot Anomaly Detection with DINOv2 [16.69402464709241]
We adapt DINOv2 for one-shot and few-shot anomaly detection, with a focus on industrial applications. Our proposed vision-only approach, AnomalyDINO, is based on patch similarities and enables both image-level anomaly prediction and pixel-level anomaly segmentation. Despite its simplicity, AnomalyDINO achieves state-of-the-art results in one- and few-shot anomaly detection (e.g., pushing the one-shot performance on MVTec-AD from an AUROC of 93.1% to 96.6%).
arXiv Detail & Related papers (2024-05-23T13:15:13Z)
Learning Feature Inversion for Multi-class Anomaly Detection under General-purpose COCO-AD Benchmark [101.23684938489413]
Anomaly detection (AD) is often focused on detecting anomalies for industrial quality inspection and medical lesion examination. This work first constructs a large-scale and general-purpose COCO-AD dataset by extending COCO to the AD field. Inspired by the metrics in the segmentation field, we propose several more practical threshold-dependent AD-specific metrics.
arXiv Detail & Related papers (2024-04-16T17:38:26Z)
Toward Multi-class Anomaly Detection: Exploring Class-aware Unified Model against Inter-class Interference [67.36605226797887]
We introduce a Multi-class Implicit Neural representation Transformer for unified Anomaly Detection (MINT-AD) By learning the multi-class distributions, the model generates class-aware query embeddings for the transformer decoder. MINT-AD can project category and position information into a feature embedding space, further supervised by classification and prior probability loss functions.
arXiv Detail & Related papers (2024-03-21T08:08:31Z)
Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection [59.41026558455904]
We focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets. We propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.
arXiv Detail & Related papers (2024-01-06T07:30:41Z)
Generating and Reweighting Dense Contrastive Patterns for Unsupervised Anomaly Detection [59.34318192698142]
We introduce a prior-less anomaly generation paradigm and develop an innovative unsupervised anomaly detection framework named GRAD. PatchDiff effectively expose various types of anomaly patterns. experiments on both MVTec AD and MVTec LOCO datasets also support the aforementioned observation.
arXiv Detail & Related papers (2023-12-26T07:08:06Z)
Exploring Plain ViT Reconstruction for Multi-class Unsupervised Anomaly Detection [128.40330044868293]
Vision Transformer (ViT) showcasing a more straightforward architecture has proven effective in multiple domains. ViTAD achieves state-of-the-art results and efficiency on MVTec AD, VisA, and Uni-Medical datasets.
arXiv Detail & Related papers (2023-12-12T18:28:59Z)
Anomaly Detection via Multi-Scale Contrasted Memory [3.0170109896527086]
We introduce a new two-stage anomaly detector which memorizes during training multi-scale normal prototypes to compute an anomaly deviation score. Our model highly improves the state-of-the-art performance on a wide range of object, style and local anomalies with up to 35% error relative improvement on CIFAR-10.
arXiv Detail & Related papers (2022-11-16T16:58:04Z)
A Unified Model for Multi-class Anomaly Detection [33.534990722449066]
UniAD accomplishes anomaly detection for multiple classes with a unified framework. We evaluate our algorithm on MVTec-AD and CIFAR-10 datasets.
arXiv Detail & Related papers (2022-06-08T06:05:09Z)
Anomaly Detection via Reverse Distillation from One-Class Embedding [2.715884199292287]
We propose a novel T-S model consisting of a teacher encoder and a student decoder. Instead of receiving raw images directly, the student network takes teacher model's one-class embedding as input. In addition, we introduce a trainable one-class bottleneck embedding module in our T-S model.
arXiv Detail & Related papers (2022-01-26T01:48:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.