Vision-Language Models Assisted Unsupervised Video Anomaly Detection
- URL: http://arxiv.org/abs/2409.14109v2
- Date: Thu, 26 Sep 2024 01:38:52 GMT
- Title: Vision-Language Models Assisted Unsupervised Video Anomaly Detection
- Authors: Yalong Jiang, Liquan Mao,
- Abstract summary: Anomaly samples present significant challenges for unsupervised learning methods.
Our method employs a cross-modal pre-trained model that leverages the inferential capabilities of large language models.
By mapping high-dimensional visual features to low-dimensional semantic ones, our method significantly enhance the interpretability of unsupervised anomaly detection.
- Score: 3.1095294567873606
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video anomaly detection is a subject of great interest across industrial and academic domains due to its crucial role in computer vision applications. However, the inherent unpredictability of anomalies and the scarcity of anomaly samples present significant challenges for unsupervised learning methods. To overcome the limitations of unsupervised learning, which stem from a lack of comprehensive prior knowledge about anomalies, we propose VLAVAD (Video-Language Models Assisted Anomaly Detection). Our method employs a cross-modal pre-trained model that leverages the inferential capabilities of large language models (LLMs) in conjunction with a Selective-Prompt Adapter (SPA) for selecting semantic space. Additionally, we introduce a Sequence State Space Module (S3M) that detects temporal inconsistencies in semantic features. By mapping high-dimensional visual features to low-dimensional semantic ones, our method significantly enhance the interpretability of unsupervised anomaly detection. Our proposed approach effectively tackles the challenge of detecting elusive anomalies that are hard to discern over periods, achieving SOTA on the challenging ShanghaiTech dataset.
Related papers
- Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance.
Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z) - MissionGNN: Hierarchical Multimodal GNN-based Weakly Supervised Video Anomaly Recognition with Mission-Specific Knowledge Graph Generation [5.0923114224599555]
This paper introduces a novel hierarchical graph neural network (GNN) based model MissionGNN.
Our approach circumvents the limitations of previous methods by avoiding heavy gradient computations on large multimodal models.
Our model provides a practical and efficient solution for real-time video analysis without the constraints of previous segmentation-based or multimodal approaches.
arXiv Detail & Related papers (2024-06-27T01:09:07Z) - Dynamic Erasing Network Based on Multi-Scale Temporal Features for
Weakly Supervised Video Anomaly Detection [103.92970668001277]
We propose a Dynamic Erasing Network (DE-Net) for weakly supervised video anomaly detection.
We first propose a multi-scale temporal modeling module, capable of extracting features from segments of varying lengths.
Then, we design a dynamic erasing strategy, which dynamically assesses the completeness of the detected anomalies.
arXiv Detail & Related papers (2023-12-04T09:40:11Z) - Video Anomaly Detection using GAN [0.0]
This thesis study aims to offer the solution for this use case so that human resources won't be required to keep an eye out for any unusual activity in the surveillance system records.
We have developed a novel generative adversarial network (GAN) based anomaly detection model.
arXiv Detail & Related papers (2023-11-23T16:41:30Z) - Open-Vocabulary Video Anomaly Detection [57.552523669351636]
Video anomaly detection (VAD) with weak supervision has achieved remarkable performance in utilizing video-level labels to discriminate whether a video frame is normal or abnormal.
Recent studies attempt to tackle a more realistic setting, open-set VAD, which aims to detect unseen anomalies given seen anomalies and normal videos.
This paper takes a step further and explores open-vocabulary video anomaly detection (OVVAD), in which we aim to leverage pre-trained large models to detect and categorize seen and unseen anomalies.
arXiv Detail & Related papers (2023-11-13T02:54:17Z) - Future Video Prediction from a Single Frame for Video Anomaly Detection [0.38073142980732994]
Video anomaly detection (VAD) is an important but challenging task in computer vision.
We introduce the task of future frame prediction proxy-task, as a novel proxy-task for video anomaly detection.
This proxy-task alleviates the challenges of previous methods in learning longer motion patterns.
arXiv Detail & Related papers (2023-08-15T14:04:50Z) - MGFN: Magnitude-Contrastive Glance-and-Focus Network for
Weakly-Supervised Video Anomaly Detection [39.923871347007875]
We propose a novel glance and focus network to integrate spatial-temporal information for accurate anomaly detection.
Existing approaches that use feature magnitudes to represent the degree of anomalies typically ignore the effects of scene variations.
We propose the Feature Amplification Mechanism and a Magnitude Contrastive Loss to enhance the discriminativeness of feature magnitudes for detecting anomalies.
arXiv Detail & Related papers (2022-11-28T07:10:36Z) - Spatio-temporal predictive tasks for abnormal event detection in videos [60.02503434201552]
We propose new constrained pretext tasks to learn object level normality patterns.
Our approach consists in learning a mapping between down-scaled visual queries and their corresponding normal appearance and motion characteristics.
Experiments on several benchmark datasets demonstrate the effectiveness of our approach to localize and track anomalies.
arXiv Detail & Related papers (2022-10-27T19:45:12Z) - Anomaly Detection in Video via Self-Supervised and Multi-Task Learning [113.81927544121625]
Anomaly detection in video is a challenging computer vision problem.
In this paper, we approach anomalous event detection in video through self-supervised and multi-task learning at the object level.
arXiv Detail & Related papers (2020-11-15T10:21:28Z) - Manifolds for Unsupervised Visual Anomaly Detection [79.22051549519989]
Unsupervised learning methods that don't necessarily encounter anomalies in training would be immensely useful.
We develop a novel hyperspherical Variational Auto-Encoder (VAE) via stereographic projections with a gyroplane layer.
We present state-of-the-art results on visual anomaly benchmarks in precision manufacturing and inspection, demonstrating real-world utility in industrial AI scenarios.
arXiv Detail & Related papers (2020-06-19T20:41:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.