One Language-Free Foundation Model Is Enough for Universal Vision Anomaly Detection
- URL: http://arxiv.org/abs/2601.05552v1
- Date: Fri, 09 Jan 2026 06:05:18 GMT
- Title: One Language-Free Foundation Model Is Enough for Universal Vision Anomaly Detection
- Authors: Bin-Bin Gao, Chengjie Wang,
- Abstract summary: Universal visual anomaly detection (AD) aims to identify anomaly images and segment anomaly regions towards open and dynamic scenarios.<n>Current methods often struggle with complex prompt engineering, elaborate adaptation modules, and challenging training strategies.<n>This paper presents an embarrassingly simple, general, and effective framework for Universal vision Anomaly Detection (UniADet)
- Score: 65.11602552904456
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Universal visual anomaly detection (AD) aims to identify anomaly images and segment anomaly regions towards open and dynamic scenarios, following zero- and few-shot paradigms without any dataset-specific fine-tuning. We have witnessed significant progress in widely use of visual-language foundational models in recent approaches. However, current methods often struggle with complex prompt engineering, elaborate adaptation modules, and challenging training strategies, ultimately limiting their flexibility and generality. To address these issues, this paper rethinks the fundamental mechanism behind visual-language models for AD and presents an embarrassingly simple, general, and effective framework for Universal vision Anomaly Detection (UniADet). Specifically, we first find language encoder is used to derive decision weights for anomaly classification and segmentation, and then demonstrate that it is unnecessary for universal AD. Second, we propose an embarrassingly simple method to completely decouple classification and segmentation, and decouple cross-level features, i.e., learning independent weights for different tasks and hierarchical features. UniADet is highly simple (learning only decoupled weights), parameter-efficient (only 0.002M learnable parameters), general (adapting a variety of foundation models), and effective (surpassing state-of-the-art zero-/few-shot by a large margin and even full-shot AD methods for the first time) on 14 real-world AD benchmarks covering both industrial and medical domains. We will make the code and model of UniADet available at https://github.com/gaobb/UniADet.
Related papers
- ICAD-LLM: One-for-All Anomaly Detection via In-Context Learning with Large Language Models [14.804039283733475]
Anomaly detection is a fundamental task of critical importance across numerous domains.<n>We introduce a novel paradigm: In-Context Anomaly Detection (ICAD), where anomalies are defined by their dissimilarity to a relevant reference set of normal samples.<n>Under this paradigm, we propose ICAD-LLM, a unified AD framework leveraging Large Language Models' in-context learning abilities to process heterogeneous data within a single model.
arXiv Detail & Related papers (2025-12-01T13:41:30Z) - Generalist Multi-Class Anomaly Detection via Distillation to Two Heterogeneous Student Networks [11.543429175824905]
Anomaly detection plays an important role in various real-world applications.<n>Recent methods have attempted to address general anomaly detection, but their performance remains sensitive to dataset-specific settings and single-class tasks.<n>We propose a novel dual-model ensemble approach based on knowledge distillation (KD) to bridge this gap.
arXiv Detail & Related papers (2025-09-29T08:31:31Z) - ResAD++: Towards Class Agnostic Anomaly Detection via Residual Feature Learning [52.11294707895649]
This paper explores the problem of class-agnostic anomaly detection (AD)<n>The objective is to train one class-agnostic AD model that can generalize to detect anomalies in diverse new classes from different domains without any retraining or fine-tuning on the target data.<n> Comprehensive experiments on eight real-world AD datasets demonstrate that our ResAD++ can achieve remarkable AD results when directly used in new classes.
arXiv Detail & Related papers (2025-09-28T08:41:05Z) - NeuCoReClass AD: Redefining Self-Supervised Time Series Anomaly Detection [0.8349690795786082]
We introduce NeuCoReClass AD, a self-supervised multi-task time series anomaly detection framework.<n>Our method employs neural transformation learning to generate augmented views that are informative, diverse, and coherent, without requiring domain-specific knowledge.
arXiv Detail & Related papers (2025-07-29T15:04:05Z) - MetaUAS: Universal Anomaly Segmentation with One-Prompt Meta-Learning [4.887838886202545]
We present a novel paradigm that unifies anomaly segmentation into change segmentation.<n>We propose a one-prompt Meta-learning framework for Universal Anomaly (MetaUAS)<n>Our method effectively and efficiently segments any anomalies with only one normal image prompt.
arXiv Detail & Related papers (2025-05-14T10:25:26Z) - Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection [58.87142367781417]
A naively trained detector tends to favor overfitting to the limited and monotonous fake patterns, causing the feature space to become highly constrained and low-ranked.<n>One potential remedy is incorporating the pre-trained knowledge within the vision foundation models to expand the feature space.<n>By freezing the principal components and adapting only the remained components, we preserve the pre-trained knowledge while learning fake patterns.
arXiv Detail & Related papers (2024-11-23T19:10:32Z) - Learning Feature Inversion for Multi-class Anomaly Detection under General-purpose COCO-AD Benchmark [101.23684938489413]
Anomaly detection (AD) is often focused on detecting anomalies for industrial quality inspection and medical lesion examination.
This work first constructs a large-scale and general-purpose COCO-AD dataset by extending COCO to the AD field.
Inspired by the metrics in the segmentation field, we propose several more practical threshold-dependent AD-specific metrics.
arXiv Detail & Related papers (2024-04-16T17:38:26Z) - Exploring Plain ViT Reconstruction for Multi-class Unsupervised Anomaly Detection [128.40330044868293]
Vision Transformer (ViT) showcasing a more straightforward architecture has proven effective in multiple domains.
ViTAD achieves state-of-the-art results and efficiency on MVTec AD, VisA, and Uni-Medical datasets.
arXiv Detail & Related papers (2023-12-12T18:28:59Z) - Aligning and Prompting Everything All at Once for Universal Visual
Perception [79.96124061108728]
APE is a universal visual perception model for aligning and prompting everything all at once in an image to perform diverse tasks.
APE advances the convergence of detection and grounding by reformulating language-guided grounding as open-vocabulary detection.
Experiments on over 160 datasets demonstrate that APE outperforms state-of-the-art models.
arXiv Detail & Related papers (2023-12-04T18:59:50Z) - Learning to Generalize Unseen Domains via Memory-based Multi-Source
Meta-Learning for Person Re-Identification [59.326456778057384]
We propose the Memory-based Multi-Source Meta-Learning framework to train a generalizable model for unseen domains.
We also present a meta batch normalization layer (MetaBN) to diversify meta-test features.
Experiments demonstrate that our M$3$L can effectively enhance the generalization ability of the model for unseen domains.
arXiv Detail & Related papers (2020-12-01T11:38:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.