Related papers: Semi-supervised Active Learning for Video Action Detection

Semi-supervised Active Learning for Video Action Detection

URL: http://arxiv.org/abs/2312.07169v3
Date: Wed, 3 Apr 2024 15:11:33 GMT
Title: Semi-supervised Active Learning for Video Action Detection
Authors: Ayush Singh, Aayush J Rana, Akash Kumar, Shruti Vyas, Yogesh Singh Rawat,
Abstract summary: We develop a novel semi-supervised active learning approach which utilizes both labeled as well as unlabeled data. We evaluate the proposed approach on three different benchmark datasets, UCF-24-101, JHMDB-21, and Youtube-VOS.
Score: 8.110693267550346
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this work, we focus on label efficient learning for video action detection. We develop a novel semi-supervised active learning approach which utilizes both labeled as well as unlabeled data along with informative sample selection for action detection. Video action detection requires spatio-temporal localization along with classification, which poses several challenges for both active learning informative sample selection as well as semi-supervised learning pseudo label generation. First, we propose NoiseAug, a simple augmentation strategy which effectively selects informative samples for video action detection. Next, we propose fft-attention, a novel technique based on high-pass filtering which enables effective utilization of pseudo label for SSL in video action detection by emphasizing on relevant activity region within a video. We evaluate the proposed approach on three different benchmark datasets, UCF-101-24, JHMDB-21, and Youtube-VOS. First, we demonstrate its effectiveness on video action detection where the proposed approach outperforms prior works in semi-supervised and weakly-supervised learning along with several baseline approaches in both UCF101-24 and JHMDB-21. Next, we also show its effectiveness on Youtube-VOS for video object segmentation demonstrating its generalization capability for other dense prediction tasks in videos. The code and models is publicly available at: \url{https://github.com/AKASH2907/semi-sup-active-learning}.

Related papers

Open-Vocabulary Spatio-Temporal Action Detection [59.91046192096296]
Open-vocabulary-temporal action detection (OV-STAD) is an important fine-grained video understanding task. OV-STAD requires training a model on a limited set of base classes with box and label supervision. To better adapt the holistic VLM for the fine-grained action detection task, we carefully fine-tune it on the localized video region-text pairs.
arXiv Detail & Related papers (2024-05-17T14:52:47Z)
Towards Active Learning for Action Spotting in Association Football Videos [59.84375958757395]
Analyzing football videos is challenging and requires identifying subtle and diverse-temporal patterns. Current algorithms face significant challenges when learning from limited annotated data. We propose an active learning framework that selects the most informative video samples to be annotated next.
arXiv Detail & Related papers (2023-04-09T11:50:41Z)
Weakly Supervised Two-Stage Training Scheme for Deep Video Fight Detection Model [0.0]
Fight detection in videos is an emerging deep learning application with today's prevalence of surveillance systems and streaming media. Previous work has largely relied on action recognition techniques to tackle this problem. We design the fight detection model as a composition of an action-aware feature extractor and an anomaly score generator.
arXiv Detail & Related papers (2022-09-23T08:29:16Z)
Enabling Weakly-Supervised Temporal Action Localization from On-Device Learning of the Video Stream [5.215681853828831]
We propose an efficient video learning approach to learn from a long, untrimmed streaming video. To the best of our knowledge, we are the first attempt to directly learn from the on-device, long video stream.
arXiv Detail & Related papers (2022-08-25T13:41:03Z)
End-to-End Semi-Supervised Learning for Video Action Detection [23.042410033982193]
We propose a simple end-to-end based approach effectively which utilizes the unlabeled data. Video action detection requires both, action class prediction as well as a-temporal consistency. We demonstrate the effectiveness of the proposed approach on two different action detection benchmark datasets.
arXiv Detail & Related papers (2022-03-08T18:11:25Z)
Cross-category Video Highlight Detection via Set-based Learning [55.49267044910344]
We propose a Dual-Learner-based Video Highlight Detection (DL-VHD) framework. It learns the distinction of target category videos and the characteristics of highlight moments on source video category. It outperforms five typical Unsupervised Domain Adaptation (UDA) algorithms on various cross-category highlight detection tasks.
arXiv Detail & Related papers (2021-08-26T13:06:47Z)
Learning from Weakly-labeled Web Videos via Exploring Sub-Concepts [89.06560404218028]
We introduce a new method for pre-training video action recognition models using queried web videos. Instead of trying to filter out, we propose to convert the potential noises in these queried videos to useful supervision signals. We show that SPL outperforms several existing pre-training strategies using pseudo-labels.
arXiv Detail & Related papers (2021-01-11T05:50:16Z)
Gabriella: An Online System for Real-Time Activity Detection in Untrimmed Security Videos [72.50607929306058]
We propose a real-time online system to perform activity detection on untrimmed security videos. The proposed method consists of three stages: tubelet extraction, activity classification and online tubelet merging. We demonstrate the effectiveness of the proposed approach in terms of speed (100 fps) and performance with state-of-the-art results.
arXiv Detail & Related papers (2020-04-23T22:20:10Z)
Delving into 3D Action Anticipation from Streaming Videos [99.0155538452263]
Action anticipation aims to recognize the action with a partial observation. We introduce several complementary evaluation metrics and present a basic model based on frame-wise action classification. We also explore multi-task learning strategies by incorporating auxiliary information from two aspects: the full action representation and the class-agnostic action label.
arXiv Detail & Related papers (2019-06-15T10:30:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.