Overlooked Video Classification in Weakly Supervised Video Anomaly
Detection
- URL: http://arxiv.org/abs/2210.06688v2
- Date: Wed, 19 Apr 2023 22:23:33 GMT
- Title: Overlooked Video Classification in Weakly Supervised Video Anomaly
Detection
- Authors: Weijun Tan, Qi Yao, Jingfeng Liu
- Abstract summary: We study explicitly the power of video classification supervision using a BERT or LSTM.
With this BERT or LSTM, CNN features of all snippets of a video can be aggregated into a single feature which can be used for video classification.
This simple yet powerful video classification supervision, combined into the MIL framework, brings extraordinary performance improvement on all three major video anomaly detection datasets.
- Score: 4.162019309587633
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Current weakly supervised video anomaly detection algorithms mostly use
multiple instance learning (MIL) or their varieties. Almost all recent
approaches focus on how to select the correct snippets for training to improve
the performance. They overlook or do not realize the power of video
classification in boosting the performance of anomaly detection. In this paper,
we study explicitly the power of video classification supervision using a BERT
or LSTM. With this BERT or LSTM, CNN features of all snippets of a video can be
aggregated into a single feature which can be used for video classification.
This simple yet powerful video classification supervision, combined into the
MIL framework, brings extraordinary performance improvement on all three major
video anomaly detection datasets. Particularly it improves the mean average
precision (mAP) on the XD-Violence from SOTA 78.84\% to new 82.10\%. The source
code is available at
https://github.com/wjtan99/BERT_Anomaly_Video_Classification.
Related papers
- VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs [64.60035916955837]
VANE-Bench is a benchmark designed to assess the proficiency of Video-LMMs in detecting anomalies and inconsistencies in videos.
Our dataset comprises an array of videos synthetically generated using existing state-of-the-art text-to-video generation models.
We evaluate nine existing Video-LMMs, both open and closed sources, on this benchmarking task and find that most of the models encounter difficulties in effectively identifying the subtle anomalies.
arXiv Detail & Related papers (2024-06-14T17:59:01Z) - Semi-supervised Active Learning for Video Action Detection [8.110693267550346]
We develop a novel semi-supervised active learning approach which utilizes both labeled as well as unlabeled data.
We evaluate the proposed approach on three different benchmark datasets, UCF-24-101, JHMDB-21, and Youtube-VOS.
arXiv Detail & Related papers (2023-12-12T11:13:17Z) - Building an Open-Vocabulary Video CLIP Model with Better Architectures,
Optimization and Data [102.0069667710562]
This paper presents Open-VCLIP++, a framework that adapts CLIP to a strong zero-shot video classifier.
We demonstrate that training Open-VCLIP++ is tantamount to continual learning with zero historical data.
Our approach is evaluated on three widely used action recognition datasets.
arXiv Detail & Related papers (2023-10-08T04:46:43Z) - Anomaly detection in surveillance videos using transformer based
attention model [3.2968779106235586]
This research suggests using a weakly supervised strategy to avoid annotating anomalous segments in training videos.
The proposed framework is validated on real-world dataset i.e. ShanghaiTech Campus dataset.
arXiv Detail & Related papers (2022-06-03T12:19:39Z) - Few-Shot Video Object Detection [70.43402912344327]
We introduce Few-Shot Video Object Detection (FSVOD) with three important contributions.
FSVOD-500 comprises of 500 classes with class-balanced videos in each category for few-shot learning.
Our TPN and TMN+ are jointly and end-to-end trained.
arXiv Detail & Related papers (2021-04-30T07:38:04Z) - Cleaning Label Noise with Clusters for Minimally Supervised Anomaly
Detection [26.062659852373653]
We formulate a weakly supervised anomaly detection method that is trained using only video-level labels.
The proposed method yields 78.27% and 84.16% frame-level AUC on UCF-crime and ShanghaiTech datasets respectively.
arXiv Detail & Related papers (2021-04-30T06:03:24Z) - Less is More: ClipBERT for Video-and-Language Learning via Sparse
Sampling [98.41300980759577]
A canonical approach to video-and-language learning dictates a neural model to learn from offline-extracted dense video features.
We propose a generic framework ClipBERT that enables affordable end-to-end learning for video-and-language tasks.
Experiments on text-to-video retrieval and video question answering on six datasets demonstrate that ClipBERT outperforms existing methods.
arXiv Detail & Related papers (2021-02-11T18:50:16Z) - VideoMix: Rethinking Data Augmentation for Video Classification [29.923635550986997]
State-of-the-art video action classifiers often suffer from overfitting.
Recent data augmentation strategies have been reported to address the overfitting problems.
VideoMix lets a model learn beyond the object and scene biases and extract more robust cues for action recognition.
arXiv Detail & Related papers (2020-12-07T05:40:33Z) - Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed
Videos [82.02074241700728]
In this paper, we present a prohibitive-level action recognition model that is trained with only video-frame labels.
Our method per person detectors have been trained on large image datasets within Multiple Instance Learning framework.
We show how we can apply our method in cases where the standard Multiple Instance Learning assumption, that each bag contains at least one instance with the specified label, is invalid.
arXiv Detail & Related papers (2020-07-21T10:45:05Z) - Generalized Few-Shot Video Classification with Video Retrieval and
Feature Generation [132.82884193921535]
We argue that previous methods underestimate the importance of video feature learning and propose a two-stage approach.
We show that this simple baseline approach outperforms prior few-shot video classification methods by over 20 points on existing benchmarks.
We present two novel approaches that yield further improvement.
arXiv Detail & Related papers (2020-07-09T13:05:32Z) - VideoSSL: Semi-Supervised Learning for Video Classification [30.348819309923098]
We propose a semi-supervised learning approach for video classification, VideoSSL, using convolutional neural networks (CNN)
To minimize the dependence on a large annotated dataset, our proposed method trains from a small number of labeled examples.
We show that, under the supervision of these guiding signals from unlabeled examples, a video classification CNN can achieve impressive performances.
arXiv Detail & Related papers (2020-02-29T07:13:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.