Multi-Task Learning based Video Anomaly Detection with Attention
- URL: http://arxiv.org/abs/2210.07697v2
- Date: Thu, 11 May 2023 13:41:27 GMT
- Title: Multi-Task Learning based Video Anomaly Detection with Attention
- Authors: Mohammad Baradaran and Robert Bergevin
- Abstract summary: We propose a novel multi-task learning based method that combines complementary proxy tasks to better consider the motion and appearance features.
We combine the semantic segmentation and future frame prediction tasks in a single branch to learn the object class and consistent motion patterns.
In the second branch, we added several attention mechanisms to detect motion anomalies with attention to object parts, the direction of motion, and the distance of the objects from the camera.
- Score: 1.2944868613449219
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-task learning based video anomaly detection methods combine multiple
proxy tasks in different branches to detect video anomalies in different
situations. Most existing methods either do not combine complementary tasks to
effectively cover all motion patterns, or the class of the objects is not
explicitly considered. To address the aforementioned shortcomings, we propose a
novel multi-task learning based method that combines complementary proxy tasks
to better consider the motion and appearance features. We combine the semantic
segmentation and future frame prediction tasks in a single branch to learn the
object class and consistent motion patterns, and to detect respective anomalies
simultaneously. In the second branch, we added several attention mechanisms to
detect motion anomalies with attention to object parts, the direction of
motion, and the distance of the objects from the camera. Our qualitative
results show that the proposed method considers the object class effectively
and learns motion with attention to the aforementioned important factors which
results in a precise motion modeling and a better motion anomaly detection.
Additionally, quantitative results show the superiority of our method compared
with state-of-the-art methods.
Related papers
- Multi-view Action Recognition via Directed Gromov-Wasserstein Discrepancy [12.257725479880458]
Action recognition has become one of the popular research topics in computer vision.
We propose a multi-view attention consistency method that computes the similarity between two attentions from two different views of the action videos.
Our approach applies the idea of Neural Radiance Field to implicitly render the features from novel views when training on single-view datasets.
arXiv Detail & Related papers (2024-05-02T14:43:21Z) - SeMAIL: Eliminating Distractors in Visual Imitation via Separated Models [22.472167814814448]
We propose a new model-based imitation learning algorithm named Separated Model-based Adversarial Imitation Learning (SeMAIL)
Our method achieves near-expert performance on various visual control tasks with complex observations and the more challenging tasks with different backgrounds from expert observations.
arXiv Detail & Related papers (2023-06-19T04:33:44Z) - Visual Exemplar Driven Task-Prompting for Unified Perception in
Autonomous Driving [100.3848723827869]
We present an effective multi-task framework, VE-Prompt, which introduces visual exemplars via task-specific prompting.
Specifically, we generate visual exemplars based on bounding boxes and color-based markers, which provide accurate visual appearances of target categories.
We bridge transformer-based encoders and convolutional layers for efficient and accurate unified perception in autonomous driving.
arXiv Detail & Related papers (2023-03-03T08:54:06Z) - Multi-Task Learning of Object State Changes from Uncurated Videos [55.60442251060871]
We learn to temporally localize object state changes by observing people interacting with objects in long uncurated web videos.
We show that our multi-task model achieves a relative improvement of 40% over the prior single-task methods.
We also test our method on long egocentric videos of the EPIC-KITCHENS and the Ego4D datasets in a zero-shot setup.
arXiv Detail & Related papers (2022-11-24T09:42:46Z) - SSMTL++: Revisiting Self-Supervised Multi-Task Learning for Video
Anomaly Detection [108.57862846523858]
We revisit the self-supervised multi-task learning framework, proposing several updates to the original method.
We modernize the 3D convolutional backbone by introducing multi-head self-attention modules.
In our attempt to further improve the model, we study additional self-supervised learning tasks, such as predicting segmentation maps.
arXiv Detail & Related papers (2022-07-16T19:25:41Z) - Few-Shot Fine-Grained Action Recognition via Bidirectional Attention and
Contrastive Meta-Learning [51.03781020616402]
Fine-grained action recognition is attracting increasing attention due to the emerging demand of specific action understanding in real-world applications.
We propose a few-shot fine-grained action recognition problem, aiming to recognize novel fine-grained actions with only few samples given for each class.
Although progress has been made in coarse-grained actions, existing few-shot recognition methods encounter two issues handling fine-grained actions.
arXiv Detail & Related papers (2021-08-15T02:21:01Z) - Masked Contrastive Learning for Anomaly Detection [10.499890749386676]
We propose a task-specific variant of contrastive learning named masked contrastive learning.
We also propose a new inference method dubbed self-ensemble inference.
arXiv Detail & Related papers (2021-05-18T19:27:02Z) - Video Relation Detection with Trajectory-aware Multi-modal Features [13.358584829993193]
We present video relation detection with trajectory-aware multi-modal features to solve this task.
Our method won the first place on the video relation detection task of Video Relation Understanding Grand Challenge in ACM Multimedia 2020 with 11.74% mAP.
arXiv Detail & Related papers (2021-01-20T14:49:02Z) - Anomaly Detection in Video via Self-Supervised and Multi-Task Learning [113.81927544121625]
Anomaly detection in video is a challenging computer vision problem.
In this paper, we approach anomalous event detection in video through self-supervised and multi-task learning at the object level.
arXiv Detail & Related papers (2020-11-15T10:21:28Z) - Adaptive Object Detection with Dual Multi-Label Prediction [78.69064917947624]
We propose a novel end-to-end unsupervised deep domain adaptation model for adaptive object detection.
The model exploits multi-label prediction to reveal the object category information in each image.
We introduce a prediction consistency regularization mechanism to assist object detection.
arXiv Detail & Related papers (2020-03-29T04:23:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.