Consistency-based Self-supervised Learning for Temporal Anomaly
Localization
- URL: http://arxiv.org/abs/2208.05251v1
- Date: Wed, 10 Aug 2022 10:07:34 GMT
- Title: Consistency-based Self-supervised Learning for Temporal Anomaly
Localization
- Authors: Aniello Panariello and Angelo Porrello and Simone Calderara and Rita
Cucchiara
- Abstract summary: This work tackles Weakly Supervised Anomaly detection, in which a predictor is allowed to learn from a few labeled anomalies made available during training.
We get inspired by recent advances within the field of self-supervised learning and ask the model to yield the same scores for different augmentations of the same video sequence.
- Score: 35.34342265033686
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work tackles Weakly Supervised Anomaly detection, in which a predictor
is allowed to learn not only from normal examples but also from a few labeled
anomalies made available during training. In particular, we deal with the
localization of anomalous activities within the video stream: this is a very
challenging scenario, as training examples come only with video-level
annotations (and not frame-level). Several recent works have proposed various
regularization terms to address it i.e. by enforcing sparsity and smoothness
constraints over the weakly-learned frame-level anomaly scores. In this work,
we get inspired by recent advances within the field of self-supervised learning
and ask the model to yield the same scores for different augmentations of the
same video sequence. We show that enforcing such an alignment improves the
performance of the model on XD-Violence.
Related papers
- Temporal Divide-and-Conquer Anomaly Actions Localization in Semi-Supervised Videos with Hierarchical Transformer [0.9208007322096532]
Anomaly action detection and localization play an essential role in security and advanced surveillance systems.
We propose a hierarchical transformer model designed to evaluate the significance of observed actions in anomalous videos.
Our approach segments a parent video hierarchically into multiple temporal children instances and measures the influence of the children nodes in classifying the abnormality of the parent video.
arXiv Detail & Related papers (2024-08-24T18:12:58Z) - Contrastive Learning Is Not Optimal for Quasiperiodic Time Series [4.2807943283312095]
We introduce Distilled Embedding for Almost-Periodic Time Series (DEAPS) in this paper.
DEAPS is a non-contrastive method tailored for quasiperiodic time series, such as electrocardiogram (ECG) data.
We demonstrate a notable improvement of +10% over existing SOTA methods when just a few annotated records are presented to fit a Machine Learning (ML) model.
arXiv Detail & Related papers (2024-07-24T08:02:41Z) - Open-Vocabulary Video Anomaly Detection [57.552523669351636]
Video anomaly detection (VAD) with weak supervision has achieved remarkable performance in utilizing video-level labels to discriminate whether a video frame is normal or abnormal.
Recent studies attempt to tackle a more realistic setting, open-set VAD, which aims to detect unseen anomalies given seen anomalies and normal videos.
This paper takes a step further and explores open-vocabulary video anomaly detection (OVVAD), in which we aim to leverage pre-trained large models to detect and categorize seen and unseen anomalies.
arXiv Detail & Related papers (2023-11-13T02:54:17Z) - Few-shot Anomaly Detection in Text with Deviation Learning [13.957106119614213]
We introduce FATE, a framework that learns anomaly scores explicitly in an end-to-end method using deviation learning.
Our model is optimized to learn the distinct behavior of anomalies by utilizing a multi-head self-attention layer and multiple instance learning approaches.
arXiv Detail & Related papers (2023-08-22T20:40:21Z) - Transform-Equivariant Consistency Learning for Temporal Sentence
Grounding [66.10949751429781]
We introduce a novel Equivariant Consistency Regulation Learning framework to learn more discriminative representations for each video.
Our motivation comes from that the temporal boundary of the query-guided activity should be consistently predicted.
In particular, we devise a self-supervised consistency loss module to enhance the completeness and smoothness of the augmented video.
arXiv Detail & Related papers (2023-05-06T19:29:28Z) - Spatio-temporal predictive tasks for abnormal event detection in videos [60.02503434201552]
We propose new constrained pretext tasks to learn object level normality patterns.
Our approach consists in learning a mapping between down-scaled visual queries and their corresponding normal appearance and motion characteristics.
Experiments on several benchmark datasets demonstrate the effectiveness of our approach to localize and track anomalies.
arXiv Detail & Related papers (2022-10-27T19:45:12Z) - Fine-grained Temporal Contrastive Learning for Weakly-supervised
Temporal Action Localization [87.47977407022492]
This paper argues that learning by contextually comparing sequence-to-sequence distinctions offers an essential inductive bias in weakly-supervised action localization.
Under a differentiable dynamic programming formulation, two complementary contrastive objectives are designed, including Fine-grained Sequence Distance (FSD) contrasting and Longest Common Subsequence (LCS) contrasting.
Our method achieves state-of-the-art performance on two popular benchmarks.
arXiv Detail & Related papers (2022-03-31T05:13:50Z) - Unsupervised Pre-training for Temporal Action Localization Tasks [76.01985780118422]
We propose a self-supervised pretext task, coined as Pseudo Action localization (PAL) to Unsupervisedly Pre-train feature encoders for Temporal Action localization tasks (UP-TAL)
Specifically, we first randomly select temporal regions, each of which contains multiple clips, from one video as pseudo actions and then paste them onto different temporal positions of the other two videos.
The pretext task is to align the features of pasted pseudo action regions from two synthetic videos and maximize the agreement between them.
arXiv Detail & Related papers (2022-03-25T12:13:43Z) - Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed
Videos [82.02074241700728]
In this paper, we present a prohibitive-level action recognition model that is trained with only video-frame labels.
Our method per person detectors have been trained on large image datasets within Multiple Instance Learning framework.
We show how we can apply our method in cases where the standard Multiple Instance Learning assumption, that each bag contains at least one instance with the specified label, is invalid.
arXiv Detail & Related papers (2020-07-21T10:45:05Z) - Action Localization through Continual Predictive Learning [14.582013761620738]
We present a new approach based on continual learning that uses feature-level predictions for self-supervision.
We use a stack of LSTMs coupled with CNN encoder, along with novel attention mechanisms, to model the events in the video and use this model to predict high-level features for the future frames.
This self-supervised framework is not complicated as other approaches but is very effective in learning robust visual representations for both labeling and localization.
arXiv Detail & Related papers (2020-03-26T23:32:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.