Related papers: Unveiling the Unseen: A Comprehensive Survey on Explainable Anomaly Detection in Images and Videos

Unveiling the Unseen: A Comprehensive Survey on Explainable Anomaly Detection in Images and Videos

URL: http://arxiv.org/abs/2302.06670v4
Date: Fri, 15 Aug 2025 23:09:56 GMT
Title: Unveiling the Unseen: A Comprehensive Survey on Explainable Anomaly Detection in Images and Videos
Authors: Yizhou Wang, Dongliang Guo, Sheng Li, Octavia Camps, Yun Fu,
Abstract summary: Anomaly detection and localization in visual data, including images and videos, are crucial in machine learning and real-world applications.<n>This paper provides the first comprehensive survey focused specifically on explainable 2D visual anomaly detection (X-VAD)<n>We present a literature review of explainable methods, categorized by their underlying techniques.<n>We discuss promising future directions and open problems, including quantifying explanation quality.
Score: 49.07140708026425
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Anomaly detection and localization in visual data, including images and videos, are crucial in machine learning and real-world applications. Despite rapid advancements in visual anomaly detection (VAD), interpreting these often black-box models and explaining why specific instances are flagged as anomalous remains challenging. This paper provides the first comprehensive survey focused specifically on explainable 2D visual anomaly detection (X-VAD), covering methods for both images (IAD) and videos (VAD). We first introduce the background of IAD and VAD. Then, as the core contribution, we present a thorough literature review of explainable methods, categorized by their underlying techniques (e.g., attention-based, generative model-based, reasoning-based, foundation model-based). We analyze the commonalities and differences in applying these methods across image and video modalities, highlighting modality-specific challenges and opportunities for explainability. Additionally, we summarize relevant datasets and evaluation metrics, discussing both standard performance metrics and emerging approaches for assessing explanation quality (e.g., faithfulness, stability). Finally, we discuss promising future directions and open problems, including quantifying explanation quality, explaining diverse AD paradigms (SSL, zero-shot), enhancing context-awareness, leveraging foundation models responsibly, and addressing real-world constraints like efficiency and robustness. A curated collection of related resources is available at https://github.com/wyzjack/Awesome-XAD.

Related papers

VAAS: Vision-Attention Anomaly Scoring for Image Manipulation Detection in Digital Forensics [0.0]
Recent advances in AI-driven image generation have introduced new challenges for verifying the authenticity of digital evidence in forensic investigations.<n>Modern generative models can produce visually consistent forgeries that evade traditional detectors based on pixel or compression artefacts.<n>This paper introduces Vision-Attention Anomaly Scoring (VAAS), a novel dual-module framework that integrates global attention-based anomaly estimation.
arXiv Detail & Related papers (2025-12-17T15:05:40Z)
A Framework for Generating Artificial Datasets to Validate Absolute and Relative Position Concepts [2.0391237204597368]
The framework focuses on fundamental concepts such as object recognition, absolute and relative positions, and attribute identification.<n>The proposed framework offers a valuable instrument for generating diverse and comprehensive datasets.
arXiv Detail & Related papers (2025-09-17T18:37:24Z)
REVEAL -- Reasoning and Evaluation of Visual Evidence through Aligned Language [0.1388281922732496]
We frame this problem of forgery detection as a prompt-driven visual reasoning task, leveraging the semantic alignment capabilities of large vision-language models.<n>We propose two approaches - (1) Holistic Scene-level Evaluation that relies on the physics, semantics, perspective, and realism of the image as a whole and (2) Region-wise anomaly detection that splits the image into multiple regions and analyzes each of them.
arXiv Detail & Related papers (2025-08-18T00:42:02Z)
Anomaly Detection and Generation with Diffusion Models: A Survey [51.61574868316922]
Anomaly detection (AD) plays a pivotal role across diverse domains, including cybersecurity, finance, healthcare, and industrial manufacturing.<n>Recent advancements in deep learning, specifically diffusion models (DMs), have sparked significant interest.<n>This survey aims to guide researchers and practitioners in leveraging DMs for innovative AD solutions across diverse applications.
arXiv Detail & Related papers (2025-06-11T03:29:18Z)
Track Any Anomalous Object: A Granular Video Anomaly Detection Pipeline [63.96226274616927]
A new framework called Track Any Anomalous Object (TAO) introduces a granular video anomaly detection pipeline.<n>Unlike methods that assign anomaly scores to every pixel, our approach transforms the problem into pixel-level tracking of anomalous objects.<n>Experiments demonstrate that TAO sets new benchmarks in accuracy and robustness.
arXiv Detail & Related papers (2025-06-05T15:49:39Z)
Out-of-Distribution Detection on Graphs: A Survey [58.47395497985277]
Graph out-of-distribution (GOOD) detection focuses on identifying graph data that deviates from the distribution seen during training.<n>We categorize existing methods into four types: enhancement-based, reconstruction-based, information propagation-based, and classification-based approaches.<n>We discuss practical applications and theoretical foundations, highlighting the unique challenges posed by graph data.
arXiv Detail & Related papers (2025-02-12T04:07:12Z)
Beyond Coarse-Grained Matching in Video-Text Retrieval [50.799697216533914]
We introduce a new approach for fine-grained evaluation. Our approach can be applied to existing datasets by automatically generating hard negative test captions. Experiments on our fine-grained evaluations demonstrate that this approach enhances a model's ability to understand fine-grained differences.
arXiv Detail & Related papers (2024-10-16T09:42:29Z)
Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance. Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z)
VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs [64.60035916955837]
VANE-Bench is a benchmark designed to assess the proficiency of Video-LMMs in detecting anomalies and inconsistencies in videos. Our dataset comprises an array of videos synthetically generated using existing state-of-the-art text-to-video generation models. We evaluate nine existing Video-LMMs, both open and closed sources, on this benchmarking task and find that most of the models encounter difficulties in effectively identifying the subtle anomalies.
arXiv Detail & Related papers (2024-06-14T17:59:01Z)
Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly. Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness. Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings. This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z)
Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection. Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels. Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z)
Open-Vocabulary Video Anomaly Detection [57.552523669351636]
Video anomaly detection (VAD) with weak supervision has achieved remarkable performance in utilizing video-level labels to discriminate whether a video frame is normal or abnormal. Recent studies attempt to tackle a more realistic setting, open-set VAD, which aims to detect unseen anomalies given seen anomalies and normal videos. This paper takes a step further and explores open-vocabulary video anomaly detection (OVVAD), in which we aim to leverage pre-trained large models to detect and categorize seen and unseen anomalies.
arXiv Detail & Related papers (2023-11-13T02:54:17Z)
PAD: A Dataset and Benchmark for Pose-agnostic Anomaly Detection [28.973078719467516]
We develop Multi-pose Anomaly Detection dataset and Pose-agnostic Anomaly Detection benchmark. Specifically, we build MAD using 20 complex-shaped LEGO toys with various poses, and high-quality and diverse 3D anomalies in both simulated and real environments. We also propose a novel method OmniposeAD, trained using MAD, specifically designed for pose-agnostic anomaly detection.
arXiv Detail & Related papers (2023-10-11T17:59:56Z)
Understanding the Challenges and Opportunities of Pose-based Anomaly Detection [2.924868086534434]
Pose-based anomaly detection is a video-analysis technique for detecting anomalous events or behaviors by examining human pose extracted from the video frames. In this work, we analyze and quantify the characteristics of two well-known video anomaly datasets to better understand the difficulties of pose-based anomaly detection. We believe these experiments are beneficial for a better comprehension of pose-based anomaly detection and the datasets currently available.
arXiv Detail & Related papers (2023-03-09T18:09:45Z)
Catching Both Gray and Black Swans: Open-set Supervised Anomaly Detection [90.32910087103744]
A few labeled anomaly examples are often available in many real-world applications. These anomaly examples provide valuable knowledge about the application-specific abnormality. Those anomalies seen during training often do not illustrate every possible class of anomaly. This paper tackles open-set supervised anomaly detection.
arXiv Detail & Related papers (2022-03-28T05:21:37Z)
A Survey of Visual Sensory Anomaly Detection [53.23336329817023]
Visual sensory anomaly detection (AD) is an essential problem in computer vision. We provide a comprehensive review of visual sensory AD and category into three levels according to the form of anomalies.
arXiv Detail & Related papers (2022-02-14T19:50:03Z)
Approaches Toward Physical and General Video Anomaly Detection [0.0]
Anomaly detection in videos may enable automatic detection of malfunctions in many manufacturing, maintenance, and real-life settings. We introduce the Physical Anomalous Trajectory or Motion dataset, which contains six different video classes. We suggest an even harder benchmark where anomalous activities should be spotted on highly variable scenes.
arXiv Detail & Related papers (2021-12-14T18:57:44Z)
A Critical Study on the Recent Deep Learning Based Semi-Supervised Video Anomaly Detection Methods [3.198144010381572]
This paper introduces the researchers of the field to a new perspective and reviews the recent deep-learning based semi-supervised video anomaly detection approaches. Our goal is to help researchers develop more effective video anomaly detection methods.
arXiv Detail & Related papers (2021-11-02T14:00:33Z)
How can we learn (more) from challenges? A statistical approach to driving future algorithm development [1.0690055408831725]
We present a statistical framework for learning from challenges and instantiate it for the specific task of instrument instance segmentation in laparoscopic videos. Based on 51,542 meta data performed on 2,728 images, we applied our approach to the results of the Robust Medical Instrument Challenge (ROBUST-MIS) challenge 2019. Our method development, tailored to the specific remaining issues, yielded a deep learning model with state-of-the-art overall performance and specific strengths in the processing of images in which previous methods tended to fail.
arXiv Detail & Related papers (2021-06-17T08:12:37Z)
Self-Supervised Representation Learning for Visual Anomaly Detection [9.642625267699488]
We consider the problem of anomaly detection in images videos, and present a new visual anomaly detection technique for videos. We propose a simple self-supervision approach for learning temporal coherence across video frames without the use of any optical flow information. This intuitive approach shows superior performance of visual anomaly detection compared to numerous methods for images and videos on UCF101 and ILSVRC2015 video datasets.
arXiv Detail & Related papers (2020-06-17T04:37:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.