Related papers: Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly

Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly

URL: http://arxiv.org/abs/2405.00181v2
Date: Mon, 6 May 2024 14:57:50 GMT
Title: Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly
Authors: Hang Du, Sicheng Zhang, Binzhu Xie, Guoshun Nan, Jiayang Zhang, Junrui Xu, Hangyu Liu, Sicong Leng, Jiangming Liu, Hehe Fan, Dajiu Huang, Jing Feng, Linli Chen, Can Zhang, Xuhuan Li, Hao Zhang, Jianhang Chen, Qimei Cui, Xiaofeng Tao,
Abstract summary: We present a benchmark for Causation Understanding of Video Anomaly (CUVA) Each instance of the proposed benchmark involves three sets of human annotations to indicate the "what", "why" and "how" of an anomaly. MMEval is a novel evaluation metric designed to better align with human preferences for CUVA.
Score: 29.822544507594056
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Video anomaly understanding (VAU) aims to automatically comprehend unusual occurrences in videos, thereby enabling various applications such as traffic surveillance and industrial manufacturing. While existing VAU benchmarks primarily concentrate on anomaly detection and localization, our focus is on more practicality, prompting us to raise the following crucial questions: "what anomaly occurred?", "why did it happen?", and "how severe is this abnormal event?". In pursuit of these answers, we present a comprehensive benchmark for Causation Understanding of Video Anomaly (CUVA). Specifically, each instance of the proposed benchmark involves three sets of human annotations to indicate the "what", "why" and "how" of an anomaly, including 1) anomaly type, start and end times, and event descriptions, 2) natural language explanations for the cause of an anomaly, and 3) free text reflecting the effect of the abnormality. In addition, we also introduce MMEval, a novel evaluation metric designed to better align with human preferences for CUVA, facilitating the measurement of existing LLMs in comprehending the underlying cause and corresponding effect of video anomalies. Finally, we propose a novel prompt-based method that can serve as a baseline approach for the challenging CUVA. We conduct extensive experiments to show the superiority of our evaluation metric and the prompt-based approach. Our code and dataset are available at https://github.com/fesvhtr/CUVA.

Related papers

VAGU & GtS: LLM-Based Benchmark and Framework for Joint Video Anomaly Grounding and Understanding [22.43740206690383]
Video Anomaly Detection (VAD) aims to identify anomalous events in videos and accurately determine their time intervals.<n>VAGU is the first benchmark to integrate anomaly understanding and grounding.<n>We propose Glance then Scrutinize (GtS), a training-free framework guided by textual prompts.<n>We also propose the JeAUG metric, which jointly evaluates semantic interpretability and temporal precision.
arXiv Detail & Related papers (2025-07-29T05:17:48Z)
VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning [12.293826084601115]
Video anomaly understanding is essential for smart cities, security surveillance, and disaster alert systems.<n>Despite advances in anomaly detection, existing methods often lack interpretability and struggle to capture the causal and contextual aspects of abnormal events.<n>We introduce VAU-R1, a data-efficient framework built upon Multimodal Large Language Models (MLLMs), which enhances anomaly reasoning through Reinforcement Fine-Tuning (RFT)
arXiv Detail & Related papers (2025-05-29T14:48:10Z)
Exploring What Why and How: A Multifaceted Benchmark for Causation Understanding of Video Anomaly [12.896651217314744]
We introduce a benchmark for Exploring the Causation of Video Anomalies (ECVA) Our benchmark is meticulously designed, with each video accompanied by detailed human annotations. We propose AnomEval, a specialized evaluation metric crafted to align closely with human judgment criteria for ECVA.
arXiv Detail & Related papers (2024-12-10T04:41:44Z)
Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM [35.06386971859359]
Holmes-VAD is a novel framework that leverages precise temporal supervision and rich multimodal instructions. We construct the first large-scale multimodal VAD instruction-tuning benchmark, VAD-Instruct50k. Building upon the VAD-Instruct50k dataset, we develop a customized solution for interpretable video anomaly detection.
arXiv Detail & Related papers (2024-06-18T03:19:24Z)
VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs [64.60035916955837]
VANE-Bench is a benchmark designed to assess the proficiency of Video-LMMs in detecting anomalies and inconsistencies in videos. Our dataset comprises an array of videos synthetically generated using existing state-of-the-art text-to-video generation models. We evaluate nine existing Video-LMMs, both open and closed sources, on this benchmarking task and find that most of the models encounter difficulties in effectively identifying the subtle anomalies.
arXiv Detail & Related papers (2024-06-14T17:59:01Z)
A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [52.228708947607636]
This paper introduces a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods. The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics. We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z)
Learn Suspected Anomalies from Event Prompts for Video Anomaly Detection [16.77262005540559]
A novel framework is proposed to guide the learning of suspected anomalies from event prompts. It enables a new multi-prompt learning process to constrain the visual-semantic features across all videos. Our proposed model outperforms most state-of-the-art methods in terms of AP or AUC.
arXiv Detail & Related papers (2024-03-02T10:42:47Z)
Open-Vocabulary Video Anomaly Detection [57.552523669351636]
Video anomaly detection (VAD) with weak supervision has achieved remarkable performance in utilizing video-level labels to discriminate whether a video frame is normal or abnormal. Recent studies attempt to tackle a more realistic setting, open-set VAD, which aims to detect unseen anomalies given seen anomalies and normal videos. This paper takes a step further and explores open-vocabulary video anomaly detection (OVVAD), in which we aim to leverage pre-trained large models to detect and categorize seen and unseen anomalies.
arXiv Detail & Related papers (2023-11-13T02:54:17Z)
Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Model [70.97446870672069]
Video anomaly detection (VAD) has been paid increasing attention due to its potential applications. Video Anomaly Retrieval ( VAR) aims to pragmatically retrieve relevant anomalous videos by cross-modalities. We present two benchmarks, UCFCrime-AR and XD-Violence, constructed on top of prevalent anomaly datasets.
arXiv Detail & Related papers (2023-07-24T06:22:37Z)
A New Comprehensive Benchmark for Semi-supervised Video Anomaly Detection and Anticipation [46.687762316415096]
We propose a new comprehensive dataset, NWPU Campus, containing 43 scenes, 28 classes of abnormal events, and 16 hours of videos. It is the largest semi-supervised VAD dataset with the largest number of scenes and classes of anomalies, the longest duration, and the only one considering the scene-dependent anomaly. We propose a novel model capable of detecting and anticipating anomalous events simultaneously.
arXiv Detail & Related papers (2023-05-23T02:20:12Z)
Causalainer: Causal Explainer for Automatic Video Summarization [77.36225634727221]
In many application scenarios, improper video summarization can have a large impact. Modeling explainability is a key concern. A Causal Explainer, dubbed Causalainer, is proposed to address this issue.
arXiv Detail & Related papers (2023-04-30T11:42:06Z)
Towards Open Set Video Anomaly Detection [11.944167192592905]
Open Set Video Anomaly Detection (OpenVAD) aims to identify abnormal events from video data where both known anomalies and novel ones exist in testing. We develop a novel weakly supervised method for the OpenVAD problem by integrating evidential deep learning (EDL) and normalizing flows (NFs) into a multiple instance learning (MIL) framework.
arXiv Detail & Related papers (2022-08-23T17:53:34Z)
Self-trained Deep Ordinal Regression for End-to-End Video Anomaly Detection [114.9714355807607]
We show that applying self-trained deep ordinal regression to video anomaly detection overcomes two key limitations of existing methods. We devise an end-to-end trainable video anomaly detection approach that enables joint representation learning and anomaly scoring without manually labeled normal/abnormal data.
arXiv Detail & Related papers (2020-03-15T08:44:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.