Defect-aware Hybrid Prompt Optimization via Progressive Tuning for Zero-Shot Multi-type Anomaly Detection and Segmentation
- URL: http://arxiv.org/abs/2512.09446v1
- Date: Wed, 10 Dec 2025 09:19:17 GMT
- Title: Defect-aware Hybrid Prompt Optimization via Progressive Tuning for Zero-Shot Multi-type Anomaly Detection and Segmentation
- Authors: Nadeem Nazer, Hongkuan Zhou, Lavdim Halilaj, Ylli Sadikaj, Steffen Staab,
- Abstract summary: We introduce DAPO, a novel approach for Defect-aware Prompt Optimization based on progressive tuning for the zero-shot multi-type and binary anomaly detection and segmentation under distribution shifts.<n>Our approach aligns anomaly-relevant image features with their corresponding text semantics by learning hybrid defect-aware prompts with both fixed textual anchors and learnable token embeddings.
- Score: 12.030059666003972
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent vision language models (VLMs) like CLIP have demonstrated impressive anomaly detection performance under significant distribution shift by utilizing high-level semantic information through text prompts. However, these models often neglect fine-grained details, such as which kind of anomalies, like "hole", "cut", "scratch" that could provide more specific insight into the nature of anomalies. We argue that recognizing fine-grained anomaly types 1) enriches the representation of "abnormal" with structured semantics, narrowing the gap between coarse anomaly signals and fine-grained defect categories; 2) enables manufacturers to understand the root causes of the anomaly and implement more targeted and appropriate corrective measures quickly. While incorporating such detailed semantic information is crucial, designing handcrafted prompts for each defect type is both time-consuming and susceptible to human bias. For this reason, we introduce DAPO, a novel approach for Defect-aware Prompt Optimization based on progressive tuning for the zero-shot multi-type and binary anomaly detection and segmentation under distribution shifts. Our approach aligns anomaly-relevant image features with their corresponding text semantics by learning hybrid defect-aware prompts with both fixed textual anchors and learnable token embeddings. We conducted experiments on public benchmarks (MPDD, VisA, MVTec-AD, MAD, and Real-IAD) and an internal dataset. The results suggest that compared to the baseline models, DAPO achieves a 3.7% average improvement in AUROC and average precision metrics at the image level under distribution shift, and a 6.5% average improvement in localizing novel anomaly types under zero-shot settings.
Related papers
- HAAF: Hierarchical Adaptation and Alignment of Foundation Models for Few-Shot Pathology Anomaly Detection [10.649984141835189]
We propose the Hierarchical Adaptation and Alignment Framework (HAAF)<n>At its core is a novel Cross-Level Scaled Alignment mechanism that enforces a sequential calibration order.<n>A dual-branch inference strategy integrates semantic scores with geometric prototypes to ensure stability in few-shot settings.
arXiv Detail & Related papers (2026-01-24T10:31:21Z) - DevPrompt: Deviation-Based Prompt Learning for One-Normal ShotImage Anomaly Detection [0.0]
Few-normal shot anomaly detection (FNSAD) aims to detect abnormal regions in images using only a few normal training samples.<n>Recent approaches leverage vision-language models such as CLIP with prompt-based learning to align image and text features.<n>We propose a deviation-guided prompt learning framework that integrates the semantic power of vision-language models with the statistical reliability of deviation-based scoring.
arXiv Detail & Related papers (2026-01-21T20:35:51Z) - Correcting False Alarms from Unseen: Adapting Graph Anomaly Detectors at Test Time [60.341117019125214]
We propose a lightweight and plug-and-play Test-time adaptation framework for correcting Unseen Normal pattErns in graph anomaly detection (GAD)<n>To address semantic confusion, a graph aligner is employed to align the shifted data to the original one at the graph attribute level.<n>Extensive experiments on 10 real-world datasets demonstrate that TUNE significantly enhances the generalizability of pre-trained GAD models to both synthetic and real unseen normal patterns.
arXiv Detail & Related papers (2025-11-10T12:10:05Z) - AD-DINOv3: Enhancing DINOv3 for Zero-Shot Anomaly Detection with Anomaly-Aware Calibration [12.642531824086639]
Zero-Shot Anomaly Detection (ZSAD) seeks to identify anomalies from arbitrary novel categories.<n>Recent vision foundation models such as DINOv3 have demonstrated strong transferable representation capabilities.<n>We introduce AD-DINOv3, a novel vision-language multimodal framework designed for ZSAD.
arXiv Detail & Related papers (2025-09-17T15:29:25Z) - Crane: Context-Guided Prompt Learning and Attention Refinement for Zero-Shot Anomaly Detection [50.343419243749054]
Anomaly detection is critical in fields such as medical diagnostics and industrial defect detection.<n> CLIP's coarse-grained image-text alignment limits localization and detection performance for fine-grained anomalies.<n>Crane improves the state-of-the-art ZSAD from 2% to 28%, at both image and pixel levels, while remaining competitive in inference speed.
arXiv Detail & Related papers (2025-04-15T10:42:25Z) - Fine-grained Abnormality Prompt Learning for Zero-shot Anomaly Detection [109.72772150095646]
FAPrompt is a novel framework designed to learn Fine-grained Abnormality Prompts for accurate ZSAD.<n>Experiments on 19 real-world datasets, covering both industrial defects and medical anomalies, demonstrate that FAPrompt substantially outperforms state-of-the-art methods in both image- and pixel-level ZSAD tasks.
arXiv Detail & Related papers (2024-10-14T08:41:31Z) - Human-Free Automated Prompting for Vision-Language Anomaly Detection: Prompt Optimization with Meta-guiding Prompt Scheme [19.732769780675977]
Pre-trained vision-language models (VLMs) are highly adaptable to various downstream tasks through few-shot learning.
Traditional methods depend on human-crafted prompts that require prior knowledge of specific anomaly types.
Our goal is to develop a human-free prompt-based anomaly detection framework that optimally learns prompts through data-driven methods.
arXiv Detail & Related papers (2024-06-26T09:29:05Z) - FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization [31.854923603517264]
We propose a novel zero-shot anomaly detection (ZSAD) method called FiLo.
FiLo comprises two components: adaptively learned Fine-Grained Description (FG-Des) and position-enhanced High- quality localization (HQ-Loc)
Experimental results on datasets like MVTec and VisA demonstrate that FiLo significantly improves the performance of ZSAD in both detection and localization.
arXiv Detail & Related papers (2024-04-21T14:22:04Z) - Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection [59.41026558455904]
We focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets.
We propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.
arXiv Detail & Related papers (2024-01-06T07:30:41Z) - Video Anomaly Detection via Spatio-Temporal Pseudo-Anomaly Generation : A Unified Approach [49.995833831087175]
This work proposes a novel method for generating generic Video-temporal PAs by inpainting a masked out region of an image.
In addition, we present a simple unified framework to detect real-world anomalies under the OCC setting.
Our method performs on par with other existing state-of-the-art PAs generation and reconstruction based methods under the OCC setting.
arXiv Detail & Related papers (2023-11-27T13:14:06Z) - Self-Supervised Training with Autoencoders for Visual Anomaly Detection [61.62861063776813]
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold.
We adapt a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples.
We achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
arXiv Detail & Related papers (2022-06-23T14:16:30Z) - SLA$^2$P: Self-supervised Anomaly Detection with Adversarial
Perturbation [77.71161225100927]
Anomaly detection is a fundamental yet challenging problem in machine learning.
We propose a novel and powerful framework, dubbed as SLA$2$P, for unsupervised anomaly detection.
arXiv Detail & Related papers (2021-11-25T03:53:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.