Related papers: ShortcutBreaker: Low-Rank Noisy Bottleneck with Global Perturbation Attention for Multi-Class Unsupervised Anomaly Detection

ShortcutBreaker: Low-Rank Noisy Bottleneck with Global Perturbation Attention for Multi-Class Unsupervised Anomaly Detection

URL: http://arxiv.org/abs/2510.18342v1
Date: Tue, 21 Oct 2025 06:51:30 GMT
Title: ShortcutBreaker: Low-Rank Noisy Bottleneck with Global Perturbation Attention for Multi-Class Unsupervised Anomaly Detection
Authors: Peng Tang, Xiaoxiao Yan, Xiaobin Hu, Yuning Cui, Donghao Luo, Jiangning Zhang, Pengcheng Xu, Jinlong Peng, Qingdong He, Feiyue Huang, Song Xue, Tobias Lasser,
Abstract summary: ShortcutBreaker is a novel unified feature-reconstruction framework for MUAD tasks.<n>It features two key innovations to address the issue of shortcuts.<n>The proposed method achieves a remarkable image-level AUROC of 99.8%, 98.9%, 90.6%, and 87.8% on four datasets.
Score: 59.89803740308262
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-class unsupervised anomaly detection (MUAD) has garnered growing research interest, as it seeks to develop a unified model for anomaly detection across multiple classes, i.e., eliminating the need to train separate models for distinct objects and thereby saving substantial computational resources. Under the MUAD setting, while advanced Transformer-based architectures have brought significant performance improvements, identity shortcuts persist: they directly copy inputs to outputs, narrowing the gap in reconstruction errors between normal and abnormal cases, and thereby making the two harder to distinguish. Therefore, we propose ShortcutBreaker, a novel unified feature-reconstruction framework for MUAD tasks, featuring two key innovations to address the issue of shortcuts. First, drawing on matrix rank inequality, we design a low-rank noisy bottleneck (LRNB) to project highdimensional features into a low-rank latent space, and theoretically demonstrate its capacity to prevent trivial identity reproduction. Second, leveraging ViTs global modeling capability instead of merely focusing on local features, we incorporate a global perturbation attention to prevent information shortcuts in the decoders. Extensive experiments are performed on four widely used anomaly detection benchmarks, including three industrial datasets (MVTec-AD, ViSA, and Real-IAD) and one medical dataset (Universal Medical). The proposed method achieves a remarkable image-level AUROC of 99.8%, 98.9%, 90.6%, and 87.8% on these four datasets, respectively, consistently outperforming previous MUAD methods across different scenarios.

Related papers

CLIP Meets Diffusion: A Synergistic Approach to Anomaly Detection [54.85000884785013]
Anomaly detection is a complex problem due to the ambiguity in defining anomalies, the diversity of anomaly types, and the scarcity of training data.<n>We propose CLIPfusion, a method that leverages both discriminative and generative foundation models.<n>We believe that our method underscores the effectiveness of multi-modal and multi-model fusion in tackling the multifaceted challenges of anomaly detection.
arXiv Detail & Related papers (2025-06-13T13:30:15Z)
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free [81.65559031466452]
We conduct experiments to investigate gating-augmented softmax attention variants.<n>We find that a simple modification-applying a head-specific sigmoid gate after the Scaled Dot-Product Attention (SDPA)-consistently improves performance.
arXiv Detail & Related papers (2025-05-10T17:15:49Z)
Search is All You Need for Few-shot Anomaly Detection [39.737510049667556]
Few-shot anomaly detection (FSAD) has emerged as a crucial yet challenging task in industrial inspection.<n>We show that a straightforward nearest-neighbor search framework can surpass state-of-the-art performance in both single-class and multi-class FSAD scenarios.<n>Our method achieves remarkable image-level AUROC scores of 97.4%, 94.8%, and 70.8% respectively.
arXiv Detail & Related papers (2025-04-16T09:21:34Z)
Multimodal Task Representation Memory Bank vs. Catastrophic Forgetting in Anomaly Detection [6.991692485111346]
Unsupervised Continuous Anomaly Detection (UCAD) faces significant challenges in multi-task representation learning.<n>We propose the Multimodal Task Representation Memory Bank (MTRMB) method through two key technical innovations.<n> Experiments on MVtec AD and VisA datasets demonstrate MTRMB's superiority, achieving an average detection accuracy of 0.921 at the lowest forgetting rate.
arXiv Detail & Related papers (2025-02-10T06:49:54Z)
Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection [31.028622674616134]
We introduce a reconstruction-based anomaly detection framework, namely Dinomaly.<n>Our proposed Dinomaly achieves impressive image-level AU achieves 99.6%, 98.7%, and 89.3% on the three datasets respectively.
arXiv Detail & Related papers (2024-05-23T08:55:20Z)
Learning Feature Inversion for Multi-class Anomaly Detection under General-purpose COCO-AD Benchmark [101.23684938489413]
Anomaly detection (AD) is often focused on detecting anomalies for industrial quality inspection and medical lesion examination. This work first constructs a large-scale and general-purpose COCO-AD dataset by extending COCO to the AD field. Inspired by the metrics in the segmentation field, we propose several more practical threshold-dependent AD-specific metrics.
arXiv Detail & Related papers (2024-04-16T17:38:26Z)
Toward Multi-class Anomaly Detection: Exploring Class-aware Unified Model against Inter-class Interference [67.36605226797887]
We introduce a Multi-class Implicit Neural representation Transformer for unified Anomaly Detection (MINT-AD) By learning the multi-class distributions, the model generates class-aware query embeddings for the transformer decoder. MINT-AD can project category and position information into a feature embedding space, further supervised by classification and prior probability loss functions.
arXiv Detail & Related papers (2024-03-21T08:08:31Z)
Continuous Memory Representation for Anomaly Detection [24.58611060347548]
CRAD is a novel anomaly detection method for representing normal features within a "continuous" memory. In an evaluation using the MVTec AD dataset, CRAD significantly outperforms the previous state-of-the-art method by reducing 65.0% of the error for multi-class unified anomaly detection.
arXiv Detail & Related papers (2024-02-28T12:38:44Z)
Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection [59.41026558455904]
We focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets. We propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.
arXiv Detail & Related papers (2024-01-06T07:30:41Z)
DETR Doesn't Need Multi-Scale or Locality Design [69.56292005230185]
This paper presents an improved DETR detector that maintains a "plain" nature. It uses a single-scale feature map and global cross-attention calculations without specific locality constraints. We show that two simple technologies are surprisingly effective within a plain design to compensate for the lack of multi-scale feature maps and locality constraints.
arXiv Detail & Related papers (2023-08-03T17:59:04Z)
A Unified Model for Multi-class Anomaly Detection [33.534990722449066]
UniAD accomplishes anomaly detection for multiple classes with a unified framework. We evaluate our algorithm on MVTec-AD and CIFAR-10 datasets.
arXiv Detail & Related papers (2022-06-08T06:05:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.