Real-world Video Anomaly Detection by Extracting Salient Features in
Videos
- URL: http://arxiv.org/abs/2209.06435v1
- Date: Wed, 14 Sep 2022 06:03:09 GMT
- Title: Real-world Video Anomaly Detection by Extracting Salient Features in
Videos
- Authors: Yudai Watanabe, Makoto Okabe, Yasunori Harada, Naoji Kashima
- Abstract summary: Existing methods used multiple-instance learning (MIL) to determine the normal/abnormal status of each segment of the video.
We propose a lightweight model with a self-attention mechanism to automatically extract features that are important for determining normal/abnormal from all input segments.
Our method can achieve the comparable or better accuracy than state-of-the-art methods.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a lightweight and accurate method for detecting anomalies in
videos. Existing methods used multiple-instance learning (MIL) to determine the
normal/abnormal status of each segment of the video. Recent successful
researches argue that it is important to learn the temporal relationships among
segments to achieve high accuracy, instead of focusing on only a single
segment. Therefore we analyzed the existing methods that have been successful
in recent years, and found that while it is indeed important to learn all
segments together, the temporal orders among them are irrelevant to achieving
high accuracy. Based on this finding, we do not use the MIL framework, but
instead propose a lightweight model with a self-attention mechanism to
automatically extract features that are important for determining
normal/abnormal from all input segments. As a result, our neural network model
has 1.3\% of the number of parameters of the existing method. We evaluated the
frame-level detection accuracy of our method on three benchmark datasets
(UCF-Crime, ShanghaiTech, and XD-Violence) and demonstrate that our method can
achieve the comparable or better accuracy than state-of-the-art methods.
Related papers
- Improving Online Lane Graph Extraction by Object-Lane Clustering [106.71926896061686]
We propose an architecture and loss formulation to improve the accuracy of local lane graph estimates.
The proposed method learns to assign the objects to centerlines by considering the centerlines as cluster centers.
We show that our method can achieve significant performance improvements by using the outputs of existing 3D object detection methods.
arXiv Detail & Related papers (2023-07-20T15:21:28Z) - TempNet: Temporal Attention Towards the Detection of Animal Behaviour in
Videos [63.85815474157357]
We propose an efficient computer vision- and deep learning-based method for the detection of biological behaviours in videos.
TempNet uses an encoder bridge and residual blocks to maintain model performance with a two-staged, spatial, then temporal, encoder.
We demonstrate its application to the detection of sablefish (Anoplopoma fimbria) startle events.
arXiv Detail & Related papers (2022-11-17T23:55:12Z) - Bayesian Nonparametric Submodular Video Partition for Robust Anomaly
Detection [9.145168943972067]
Multiple-instance learning (MIL) provides an effective way to tackle the video anomaly detection problem.
We propose to conduct novel Bayesian non-parametric submodular video partition (BN-SVP) to significantly improve MIL model training.
Our theoretical analysis ensures a strong performance guarantee of the proposed algorithm.
arXiv Detail & Related papers (2022-03-24T04:00:49Z) - A concise method for feature selection via normalized frequencies [0.0]
In this paper, a concise method is proposed for universal feature selection.
The proposed method uses a fusion of the filter method and the wrapper method, rather than a combination of them.
The evaluation results show that the proposed method outperformed several state-of-the-art related works in terms of accuracy, precision, recall, F-score and AUC.
arXiv Detail & Related papers (2021-06-10T15:29:54Z) - Learning Salient Boundary Feature for Anchor-free Temporal Action
Localization [81.55295042558409]
Temporal action localization is an important yet challenging task in video understanding.
We propose the first purely anchor-free temporal localization method.
Our model includes (i) an end-to-end trainable basic predictor, (ii) a saliency-based refinement module, and (iii) several consistency constraints.
arXiv Detail & Related papers (2021-03-24T12:28:32Z) - A Fast Point Cloud Ground Segmentation Approach Based on Coarse-To-Fine
Markov Random Field [0.32546166337127946]
A fast point cloud ground segmentation approach based on a coarse-to-fine Markov random field (MRF) method is proposed.
Experiments on datasets showed that our method improves on other algorithms in terms of ground segmentation accuracy.
arXiv Detail & Related papers (2020-11-26T06:07:24Z) - Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed
Videos [82.02074241700728]
In this paper, we present a prohibitive-level action recognition model that is trained with only video-frame labels.
Our method per person detectors have been trained on large image datasets within Multiple Instance Learning framework.
We show how we can apply our method in cases where the standard Multiple Instance Learning assumption, that each bag contains at least one instance with the specified label, is invalid.
arXiv Detail & Related papers (2020-07-21T10:45:05Z) - Fast Template Matching and Update for Video Object Tracking and
Segmentation [56.465510428878]
The main task we aim to tackle is the multi-instance semi-supervised video object segmentation across a sequence of frames.
The challenges lie in the selection of the matching method to predict the result as well as to decide whether to update the target template.
We propose a novel approach which utilizes reinforcement learning to make these two decisions at the same time.
arXiv Detail & Related papers (2020-04-16T08:58:45Z) - Self-trained Deep Ordinal Regression for End-to-End Video Anomaly
Detection [114.9714355807607]
We show that applying self-trained deep ordinal regression to video anomaly detection overcomes two key limitations of existing methods.
We devise an end-to-end trainable video anomaly detection approach that enables joint representation learning and anomaly scoring without manually labeled normal/abnormal data.
arXiv Detail & Related papers (2020-03-15T08:44:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.