MIST: Multiple Instance Self-Training Framework for Video Anomaly
Detection
- URL: http://arxiv.org/abs/2104.01633v1
- Date: Sun, 4 Apr 2021 15:47:14 GMT
- Title: MIST: Multiple Instance Self-Training Framework for Video Anomaly
Detection
- Authors: Jia-Chang Feng, Fa-Ting Hong, Wei-Shi Zheng
- Abstract summary: We develop a multiple instance self-training framework (MIST) to efficiently refine task-specific discriminative representations.
MIST is composed of 1) a multiple instance pseudo label generator, which adapts a sparse continuous sampling strategy to produce more reliable clip-level pseudo labels, and 2) a self-guided attention boosted feature encoder.
Our method performs comparably to or even better than existing supervised and weakly supervised methods, specifically obtaining a frame-level AUC 94.83% on ShanghaiTech.
- Score: 76.80153360498797
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Weakly supervised video anomaly detection (WS-VAD) is to distinguish
anomalies from normal events based on discriminative representations. Most
existing works are limited in insufficient video representations. In this work,
we develop a multiple instance self-training framework (MIST)to efficiently
refine task-specific discriminative representations with only video-level
annotations. In particular, MIST is composed of 1) a multiple instance pseudo
label generator, which adapts a sparse continuous sampling strategy to produce
more reliable clip-level pseudo labels, and 2) a self-guided attention boosted
feature encoder that aims to automatically focus on anomalous regions in frames
while extracting task-specific representations. Moreover, we adopt a
self-training scheme to optimize both components and finally obtain a
task-specific feature encoder. Extensive experiments on two public datasets
demonstrate the efficacy of our method, and our method performs comparably to
or even better than existing supervised and weakly supervised methods,
specifically obtaining a frame-level AUC 94.83% on ShanghaiTech.
Related papers
- Distilling Aggregated Knowledge for Weakly-Supervised Video Anomaly Detection [11.250490586786878]
Video anomaly detection aims to develop automated models capable of identifying abnormal events in surveillance videos.
We show that distilling knowledge from aggregated representations into a relatively simple model achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-06-05T00:44:42Z) - Weakly Supervised Video Individual CountingWeakly Supervised Video
Individual Counting [126.75545291243142]
Video Individual Counting aims to predict the number of unique individuals in a single video.
We introduce a weakly supervised VIC task, wherein trajectory labels are not provided.
In doing so, we devise an end-to-end trainable soft contrastive loss to drive the network to distinguish inflow, outflow, and the remaining.
arXiv Detail & Related papers (2023-12-10T16:12:13Z) - A Coarse-to-Fine Pseudo-Labeling (C2FPL) Framework for Unsupervised
Video Anomaly Detection [4.494911384096143]
Detection of anomalous events in videos is an important problem in applications such as surveillance.
We propose a simple-but-effective two-stage pseudo-label generation framework that produces segment-level (normal/anomaly) pseudo-labels.
The proposed coarse-to-fine pseudo-label generator employs carefully-designed hierarchical divisive clustering and statistical hypothesis testing.
arXiv Detail & Related papers (2023-10-26T17:59:19Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - Learning to Adapt to Unseen Abnormal Activities under Weak Supervision [43.40900198498228]
We present a meta-learning framework for weakly supervised anomaly detection in videos.
Our framework learns to adapt to unseen types of abnormal activities effectively when only video-level annotations of binary labels are available.
arXiv Detail & Related papers (2022-03-25T12:15:44Z) - ASCNet: Self-supervised Video Representation Learning with
Appearance-Speed Consistency [62.38914747727636]
We study self-supervised video representation learning, which is a challenging task due to 1) a lack of labels for explicit supervision and 2) unstructured and noisy visual information.
Existing methods mainly use contrastive loss with video clips as the instances and learn visual representation by discriminating instances from each other.
In this paper, we observe that the consistency between positive samples is the key to learn robust video representations.
arXiv Detail & Related papers (2021-06-04T08:44:50Z) - Learning to Track Instances without Video Annotations [85.9865889886669]
We introduce a novel semi-supervised framework by learning instance tracking networks with only a labeled image dataset and unlabeled video sequences.
We show that even when only trained with images, the learned feature representation is robust to instance appearance variations.
In addition, we integrate this module into single-stage instance segmentation and pose estimation frameworks.
arXiv Detail & Related papers (2021-04-01T06:47:41Z) - A Self-Reasoning Framework for Anomaly Detection Using Video-Level
Labels [17.615297975503648]
Alous event detection in surveillance videos is a challenging and practical research problem among image and video processing community.
We propose a weakly supervised anomaly detection framework based on deep neural networks which is trained in a self-reasoning fashion using only video-level labels.
The proposed framework has been evaluated on publicly available real-world anomaly detection datasets including UCF-crime, ShanghaiTech and Ped2.
arXiv Detail & Related papers (2020-08-27T02:14:15Z) - Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed
Videos [82.02074241700728]
In this paper, we present a prohibitive-level action recognition model that is trained with only video-frame labels.
Our method per person detectors have been trained on large image datasets within Multiple Instance Learning framework.
We show how we can apply our method in cases where the standard Multiple Instance Learning assumption, that each bag contains at least one instance with the specified label, is invalid.
arXiv Detail & Related papers (2020-07-21T10:45:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.