Video Anomaly Detection by Estimating Likelihood of Representations
- URL: http://arxiv.org/abs/2012.01468v1
- Date: Wed, 2 Dec 2020 19:16:22 GMT
- Title: Video Anomaly Detection by Estimating Likelihood of Representations
- Authors: Yuqi Ouyang, Victor Sanchez
- Abstract summary: Video anomaly is a challenging task because it involves solving many sub-tasks such as motion representation, object localization and action recognition.
Traditionally, solutions to this task have focused on the mapping between video frames and their low-dimensional features, while ignoring the spatial connections of those features.
Recent solutions focus on analyzing these spatial connections by using hard clustering techniques, such as K-Means, or applying neural networks to map latent features to a general understanding.
In order to solve video anomaly in the latent feature space, we propose a deep probabilistic model to transfer this task into a density estimation problem.
- Score: 21.879366166261228
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video anomaly detection is a challenging task not only because it involves
solving many sub-tasks such as motion representation, object localization and
action recognition, but also because it is commonly considered as an
unsupervised learning problem that involves detecting outliers. Traditionally,
solutions to this task have focused on the mapping between video frames and
their low-dimensional features, while ignoring the spatial connections of those
features. Recent solutions focus on analyzing these spatial connections by
using hard clustering techniques, such as K-Means, or applying neural networks
to map latent features to a general understanding, such as action attributes.
In order to solve video anomaly in the latent feature space, we propose a deep
probabilistic model to transfer this task into a density estimation problem
where latent manifolds are generated by a deep denoising autoencoder and
clustered by expectation maximization. Evaluations on several benchmarks
datasets show the strengths of our model, achieving outstanding performance on
challenging datasets.
Related papers
- HRVMamba: High-Resolution Visual State Space Model for Dense Prediction [60.80423207808076]
State Space Models (SSMs) with efficient hardware-aware designs have demonstrated significant potential in computer vision tasks.
These models have been constrained by three key challenges: insufficient inductive bias, long-range forgetting, and low-resolution output representation.
We introduce the Dynamic Visual State Space (DVSS) block, which employs deformable convolution to mitigate the long-range forgetting problem.
We also introduce High-Resolution Visual State Space Model (HRVMamba) based on the DVSS block, which preserves high-resolution representations throughout the entire process.
arXiv Detail & Related papers (2024-10-04T06:19:29Z) - Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts [57.01985221057047]
This paper introduces a novel method that learnstemporal prompt embeddings for weakly supervised video anomaly detection and localization (WSVADL) based on pre-trained vision-language models (VLMs)
Our method achieves state-of-theart performance on three public benchmarks for the WSVADL task.
arXiv Detail & Related papers (2024-08-12T03:31:29Z) - MissionGNN: Hierarchical Multimodal GNN-based Weakly Supervised Video Anomaly Recognition with Mission-Specific Knowledge Graph Generation [5.0923114224599555]
This paper introduces a novel hierarchical graph neural network (GNN) based model MissionGNN.
Our approach circumvents the limitations of previous methods by avoiding heavy gradient computations on large multimodal models.
Our model provides a practical and efficient solution for real-time video analysis without the constraints of previous segmentation-based or multimodal approaches.
arXiv Detail & Related papers (2024-06-27T01:09:07Z) - Understanding the Challenges and Opportunities of Pose-based Anomaly
Detection [2.924868086534434]
Pose-based anomaly detection is a video-analysis technique for detecting anomalous events or behaviors by examining human pose extracted from the video frames.
In this work, we analyze and quantify the characteristics of two well-known video anomaly datasets to better understand the difficulties of pose-based anomaly detection.
We believe these experiments are beneficial for a better comprehension of pose-based anomaly detection and the datasets currently available.
arXiv Detail & Related papers (2023-03-09T18:09:45Z) - DQnet: Cross-Model Detail Querying for Camouflaged Object Detection [54.82390534024954]
A convolutional neural network (CNN) for camouflaged object detection tends to activate local discriminative regions while ignoring complete object extent.
In this paper, we argue that partial activation is caused by the intrinsic characteristics of CNN.
In order to obtain feature maps that could activate full object extent, a novel framework termed Cross-Model Detail Querying network (DQnet) is proposed.
arXiv Detail & Related papers (2022-12-16T06:23:58Z) - High-resolution Iterative Feedback Network for Camouflaged Object
Detection [128.893782016078]
Spotting camouflaged objects that are visually assimilated into the background is tricky for object detection algorithms.
We aim to extract the high-resolution texture details to avoid the detail degradation that causes blurred vision in edges and boundaries.
We introduce a novel HitNet to refine the low-resolution representations by high-resolution features in an iterative feedback manner.
arXiv Detail & Related papers (2022-03-22T11:20:21Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Generalization of Neural Combinatorial Solvers Through the Lens of
Adversarial Robustness [68.97830259849086]
Most datasets only capture a simpler subproblem and likely suffer from spurious features.
We study adversarial robustness - a local generalization property - to reveal hard, model-specific instances and spurious features.
Unlike in other applications, where perturbation models are designed around subjective notions of imperceptibility, our perturbation models are efficient and sound.
Surprisingly, with such perturbations, a sufficiently expressive neural solver does not suffer from the limitations of the accuracy-robustness trade-off common in supervised learning.
arXiv Detail & Related papers (2021-10-21T07:28:11Z) - A Topological Approach for Motion Track Discrimination [10.72000349055617]
We use characteristics of target tracks extracted from video sequences as data from which to derive distinguishing topological features.
In particular, we calculate persistent homology from time-delayed embeddings of dynamic statistics calculated from motion tracks extracted from a wide field-of-view video stream.
arXiv Detail & Related papers (2021-02-10T19:25:38Z) - Graph Convolutional Networks for traffic anomaly [4.172516437934823]
Event detection has been an important task in transportation, whose task is to detect points in time when large events disrupts a large portion of the urban traffic network.
To fully capture the spatial and temporal traffic patterns remains a challenge, yet serves a crucial role for effective anomaly detection.
We formulate the problem in a novel way, as detecting anomalies in a set of directed weighted graphs representing the traffic conditions at each time interval.
arXiv Detail & Related papers (2020-12-25T22:36:22Z) - Unsupervised Spatio-temporal Latent Feature Clustering for
Multiple-object Tracking and Segmentation [0.5591659577198183]
We propose a strategy that treats the temporal identification task as a heterogeneous-temporal clustering problem.
We use a convolutional and fully connected autoencoder to learn discriminative features from segmentation masks and detection bounding boxes.
Our results show that our technique outperforms several state-of-the-art methods.
arXiv Detail & Related papers (2020-07-14T16:47:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.