Related papers: AD-DINOv3: Enhancing DINOv3 for Zero-Shot Anomaly Detection with Anomaly-Aware Calibration

AD-DINOv3: Enhancing DINOv3 for Zero-Shot Anomaly Detection with Anomaly-Aware Calibration

URL: http://arxiv.org/abs/2509.14084v2
Date: Thu, 18 Sep 2025 02:19:00 GMT
Title: AD-DINOv3: Enhancing DINOv3 for Zero-Shot Anomaly Detection with Anomaly-Aware Calibration
Authors: Jingyi Yuan, Jianxiong Ye, Wenkang Chen, Chenqiang Gao,
Abstract summary: Zero-Shot Anomaly Detection (ZSAD) seeks to identify anomalies from arbitrary novel categories.<n>Recent vision foundation models such as DINOv3 have demonstrated strong transferable representation capabilities.<n>We introduce AD-DINOv3, a novel vision-language multimodal framework designed for ZSAD.
Score: 12.642531824086639
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Zero-Shot Anomaly Detection (ZSAD) seeks to identify anomalies from arbitrary novel categories, offering a scalable and annotation-efficient solution. Traditionally, most ZSAD works have been based on the CLIP model, which performs anomaly detection by calculating the similarity between visual and text embeddings. Recently, vision foundation models such as DINOv3 have demonstrated strong transferable representation capabilities. In this work, we are the first to adapt DINOv3 for ZSAD. However, this adaptation presents two key challenges: (i) the domain bias between large-scale pretraining data and anomaly detection tasks leads to feature misalignment; and (ii) the inherent bias toward global semantics in pretrained representations often leads to subtle anomalies being misinterpreted as part of the normal foreground objects, rather than being distinguished as abnormal regions. To overcome these challenges, we introduce AD-DINOv3, a novel vision-language multimodal framework designed for ZSAD. Specifically, we formulate anomaly detection as a multimodal contrastive learning problem, where DINOv3 is employed as the visual backbone to extract patch tokens and a CLS token, and the CLIP text encoder provides embeddings for both normal and abnormal prompts. To bridge the domain gap, lightweight adapters are introduced in both modalities, enabling their representations to be recalibrated for the anomaly detection task. Beyond this baseline alignment, we further design an Anomaly-Aware Calibration Module (AACM), which explicitly guides the CLS token to attend to anomalous regions rather than generic foreground semantics, thereby enhancing discriminability. Extensive experiments on eight industrial and medical benchmarks demonstrate that AD-DINOv3 consistently matches or surpasses state-of-the-art methods.The code will be available at https://github.com/Kaisor-Yuan/AD-DINOv3.

Related papers

Defect-aware Hybrid Prompt Optimization via Progressive Tuning for Zero-Shot Multi-type Anomaly Detection and Segmentation [12.030059666003972]
We introduce DAPO, a novel approach for Defect-aware Prompt Optimization based on progressive tuning for the zero-shot multi-type and binary anomaly detection and segmentation under distribution shifts.<n>Our approach aligns anomaly-relevant image features with their corresponding text semantics by learning hybrid defect-aware prompts with both fixed textual anchors and learnable token embeddings.
arXiv Detail & Related papers (2025-12-10T09:19:17Z)
Unified Unsupervised Anomaly Detection via Matching Cost Filtering [113.43366521994396]
Unsupervised anomaly detection (UAD) aims to identify image- and pixel-level anomalies using only normal training data.<n>We present Unified Cost Filtering (UCF), a generic post-hoc refinement framework for refining anomaly cost volume of any UAD model.
arXiv Detail & Related papers (2025-10-03T03:28:18Z)
Zero-Shot Anomaly Detection with Dual-Branch Prompt Selection [17.263625932911534]
Zero-shot anomaly detection (ZSAD) enables identifying and localizing defects in unseen categories.<n>Existing ZSAD methods, whether using fixed or learned prompts, struggle under domain shifts because their training data are derived from limited training domains.<n>We introduce PILOT, a framework designed to overcome these challenges through two key innovations.
arXiv Detail & Related papers (2025-08-01T17:00:12Z)
Towards Zero-shot 3D Anomaly Localization [58.62650061201283]
3DzAL is a novel patch-level contrastive learning framework for 3D anomaly detection and localization.<n>We show that 3DzAL outperforms the state-of-the-art anomaly detection and localization performance.
arXiv Detail & Related papers (2024-12-05T16:25:27Z)
Fine-grained Abnormality Prompt Learning for Zero-shot Anomaly Detection [109.72772150095646]
FAPrompt is a novel framework designed to learn Fine-grained Abnormality Prompts for accurate ZSAD.<n>Experiments on 19 real-world datasets, covering both industrial defects and medical anomalies, demonstrate that FAPrompt substantially outperforms state-of-the-art methods in both image- and pixel-level ZSAD tasks.
arXiv Detail & Related papers (2024-10-14T08:41:31Z)
FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization [31.854923603517264]
We propose a novel zero-shot anomaly detection (ZSAD) method called FiLo. FiLo comprises two components: adaptively learned Fine-Grained Description (FG-Des) and position-enhanced High- quality localization (HQ-Loc) Experimental results on datasets like MVTec and VisA demonstrate that FiLo significantly improves the performance of ZSAD in both detection and localization.
arXiv Detail & Related papers (2024-04-21T14:22:04Z)
Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection [59.41026558455904]
We focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets. We propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.
arXiv Detail & Related papers (2024-01-06T07:30:41Z)
Unraveling the "Anomaly" in Time Series Anomaly Detection: A Self-supervised Tri-domain Solution [89.16750999704969]
Anomaly labels hinder traditional supervised models in time series anomaly detection. Various SOTA deep learning techniques, such as self-supervised learning, have been introduced to tackle this issue. We propose a novel self-supervised learning based Tri-domain Anomaly Detector (TriAD)
arXiv Detail & Related papers (2023-11-19T05:37:18Z)
Open-Vocabulary Video Anomaly Detection [57.552523669351636]
Video anomaly detection (VAD) with weak supervision has achieved remarkable performance in utilizing video-level labels to discriminate whether a video frame is normal or abnormal. Recent studies attempt to tackle a more realistic setting, open-set VAD, which aims to detect unseen anomalies given seen anomalies and normal videos. This paper takes a step further and explores open-vocabulary video anomaly detection (OVVAD), in which we aim to leverage pre-trained large models to detect and categorize seen and unseen anomalies.
arXiv Detail & Related papers (2023-11-13T02:54:17Z)
Enhancing Unsupervised Anomaly Detection with Score-Guided Network [13.127091975959358]
Anomaly detection plays a crucial role in various real-world applications, including healthcare and finance systems. We propose a novel scoring network with a score-guided regularization to learn and enlarge the anomaly score disparities between normal and abnormal data. We next propose a score-guided autoencoder (SG-AE), incorporating the scoring network into an autoencoder framework for anomaly detection.
arXiv Detail & Related papers (2021-09-10T06:14:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.