Self-Supervised Learning for Semi-Supervised Temporal Action Proposal
- URL: http://arxiv.org/abs/2104.03214v1
- Date: Wed, 7 Apr 2021 16:03:25 GMT
- Title: Self-Supervised Learning for Semi-Supervised Temporal Action Proposal
- Authors: Xiang Wang, Shiwei Zhang, Zhiwu Qing, Yuanjie Shao, Changxin Gao and
Nong Sang
- Abstract summary: We design an effective Self-supervised Semi-supervised Temporal Action Proposal (SSTAP) framework.
The SSTAP contains two crucial branches, i.e., temporal-aware semi-supervised branch and relation-aware self-supervised branch.
We extensively evaluate the proposed SSTAP on THUMOS14 and ActivityNet v1.3 datasets.
- Score: 42.6254639252739
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised learning presents a remarkable performance to utilize
unlabeled data for various video tasks. In this paper, we focus on applying the
power of self-supervised methods to improve semi-supervised action proposal
generation. Particularly, we design an effective Self-supervised
Semi-supervised Temporal Action Proposal (SSTAP) framework. The SSTAP contains
two crucial branches, i.e., temporal-aware semi-supervised branch and
relation-aware self-supervised branch. The semi-supervised branch improves the
proposal model by introducing two temporal perturbations, i.e., temporal
feature shift and temporal feature flip, in the mean teacher framework. The
self-supervised branch defines two pretext tasks, including masked feature
reconstruction and clip-order prediction, to learn the relation of temporal
clues. By this means, SSTAP can better explore unlabeled videos, and improve
the discriminative abilities of learned action features. We extensively
evaluate the proposed SSTAP on THUMOS14 and ActivityNet v1.3 datasets. The
experimental results demonstrate that SSTAP significantly outperforms
state-of-the-art semi-supervised methods and even matches fully-supervised
methods. Code is available at https://github.com/wangxiang1230/SSTAP.
Related papers
- Skeleton2vec: A Self-supervised Learning Framework with Contextualized
Target Representations for Skeleton Sequence [56.092059713922744]
We show that using high-level contextualized features as prediction targets can achieve superior performance.
Specifically, we propose Skeleton2vec, a simple and efficient self-supervised 3D action representation learning framework.
Our proposed Skeleton2vec outperforms previous methods and achieves state-of-the-art results.
arXiv Detail & Related papers (2024-01-01T12:08:35Z) - Self-Supervised Representation Learning from Temporal Ordering of
Automated Driving Sequences [49.91741677556553]
We propose TempO, a temporal ordering pretext task for pre-training region-level feature representations for perception tasks.
We embed each frame by an unordered set of proposal feature vectors, a representation that is natural for object detection or tracking systems.
Extensive evaluations on the BDD100K, nuImages, and MOT17 datasets show that our TempO pre-training approach outperforms single-frame self-supervised learning methods.
arXiv Detail & Related papers (2023-02-17T18:18:27Z) - Active Learning with Effective Scoring Functions for Semi-Supervised
Temporal Action Localization [15.031156121516211]
This paper focuses on a rarely investigated yet practical task named semi-supervised TAL.
We propose an effective active learning method, named AL-STAL.
Experiment results show that AL-STAL outperforms the existing competitors and achieves satisfying performance compared with fully-supervised learning.
arXiv Detail & Related papers (2022-08-31T13:39:38Z) - Self-Regulated Learning for Egocentric Video Activity Anticipation [147.9783215348252]
Self-Regulated Learning (SRL) aims to regulate the intermediate representation consecutively to produce representation that emphasizes the novel information in the frame of the current time-stamp.
SRL sharply outperforms existing state-of-the-art in most cases on two egocentric video datasets and two third-person video datasets.
arXiv Detail & Related papers (2021-11-23T03:29:18Z) - Weakly-Supervised Spatio-Temporal Anomaly Detection in Surveillance
Video [128.41392860714635]
We introduce Weakly-Supervised Snoma-Temporally Detection (WSSTAD) in surveillance video.
WSSTAD aims to localize a-temporal tube (i.e. sequence of bounding boxes at consecutive times) that encloses abnormal event.
We propose a dual-branch network which takes as input proposals with multi-granularities in both spatial-temporal domains.
arXiv Detail & Related papers (2021-08-09T06:11:14Z) - Unsupervised Action Segmentation with Self-supervised Feature Learning
and Co-occurrence Parsing [32.66011849112014]
temporal action segmentation is a task to classify each frame in the video with an action label.
In this work we explore a self-supervised method that operates on a corpus of unlabeled videos and predicts a likely set of temporal segments across the videos.
We develop CAP, a novel co-occurrence action parsing algorithm that can not only capture the correlation among sub-actions underlying the structure of activities, but also estimate the temporal trajectory of the sub-actions in an accurate and general way.
arXiv Detail & Related papers (2021-05-29T00:29:40Z) - Self-supervised Video Object Segmentation [76.83567326586162]
The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a.k.a. dense tracking)
We make the following contributions: (i) we propose to improve the existing self-supervised approach, with a simple, yet more effective memory mechanism for long-term correspondence matching; (ii) by augmenting the self-supervised approach with an online adaptation module, our method successfully alleviates tracker drifts caused by spatial-temporal discontinuity; (iv) we demonstrate state-of-the-art results among the self-supervised approaches on DAVIS-2017 and YouTube
arXiv Detail & Related papers (2020-06-22T17:55:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.