Leveraging Action Affinity and Continuity for Semi-supervised Temporal
Action Segmentation
- URL: http://arxiv.org/abs/2207.08653v2
- Date: Thu, 21 Jul 2022 05:46:37 GMT
- Title: Leveraging Action Affinity and Continuity for Semi-supervised Temporal
Action Segmentation
- Authors: Guodong Ding and Angela Yao
- Abstract summary: We present a semi-supervised learning approach to the temporal action segmentation task.
The goal of the task is to temporally detect and segment actions in long, untrimmed procedural videos.
We propose two novel loss functions for the unlabelled data: an action affinity loss and an action continuity loss.
- Score: 24.325716686674042
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a semi-supervised learning approach to the temporal action
segmentation task. The goal of the task is to temporally detect and segment
actions in long, untrimmed procedural videos, where only a small set of videos
are densely labelled, and a large collection of videos are unlabelled. To this
end, we propose two novel loss functions for the unlabelled data: an action
affinity loss and an action continuity loss. The action affinity loss guides
the unlabelled samples learning by imposing the action priors induced from the
labelled set. Action continuity loss enforces the temporal continuity of
actions, which also provides frame-wise classification supervision. In
addition, we propose an Adaptive Boundary Smoothing (ABS) approach to build
coarser action boundaries for more robust and reliable learning. The proposed
loss functions and ABS were evaluated on three benchmarks. Results show that
they significantly improved action segmentation performance with a low amount
(5% and 10%) of labelled data and achieved comparable results to full
supervision with 50% labelled data. Furthermore, ABS succeeded in boosting
performance when integrated into fully-supervised learning.
Related papers
- Progression-Guided Temporal Action Detection in Videos [20.02711550239915]
We present a novel framework, Action Progression Network (APN), for temporal action detection (TAD) in videos.
The framework locates actions in videos by detecting the action evolution process.
We quantify a complete action process into 101 ordered stages and train a neural network to recognize the action progressions.
arXiv Detail & Related papers (2023-08-18T03:14:05Z) - TAEC: Unsupervised Action Segmentation with Temporal-Aware Embedding and
Clustering [27.52568444236988]
We propose an unsupervised approach for learning action classes from untrimmed video sequences.
In particular, we propose a temporal embedding network that combines relative time prediction, feature reconstruction, and sequence-to-sequence learning.
Based on the identified clusters, we decode the video into coherent temporal segments that correspond to semantically meaningful action classes.
arXiv Detail & Related papers (2023-03-09T10:46:23Z) - End-to-End Semi-Supervised Learning for Video Action Detection [23.042410033982193]
We propose a simple end-to-end based approach effectively which utilizes the unlabeled data.
Video action detection requires both, action class prediction as well as a-temporal consistency.
We demonstrate the effectiveness of the proposed approach on two different action detection benchmark datasets.
arXiv Detail & Related papers (2022-03-08T18:11:25Z) - Self-Regulated Learning for Egocentric Video Activity Anticipation [147.9783215348252]
Self-Regulated Learning (SRL) aims to regulate the intermediate representation consecutively to produce representation that emphasizes the novel information in the frame of the current time-stamp.
SRL sharply outperforms existing state-of-the-art in most cases on two egocentric video datasets and two third-person video datasets.
arXiv Detail & Related papers (2021-11-23T03:29:18Z) - Flip Learning: Erase to Segment [65.84901344260277]
Weakly-supervised segmentation (WSS) can help reduce time-consuming and cumbersome manual annotation.
We propose a novel and general WSS framework called Flip Learning, which only needs the box annotation.
Our proposed approach achieves competitive performance and shows great potential to narrow the gap between fully-supervised and weakly-supervised learning.
arXiv Detail & Related papers (2021-08-02T09:56:10Z) - Unsupervised Action Segmentation with Self-supervised Feature Learning
and Co-occurrence Parsing [32.66011849112014]
temporal action segmentation is a task to classify each frame in the video with an action label.
In this work we explore a self-supervised method that operates on a corpus of unlabeled videos and predicts a likely set of temporal segments across the videos.
We develop CAP, a novel co-occurrence action parsing algorithm that can not only capture the correlation among sub-actions underlying the structure of activities, but also estimate the temporal trajectory of the sub-actions in an accurate and general way.
arXiv Detail & Related papers (2021-05-29T00:29:40Z) - Weakly Supervised Temporal Action Localization with Segment-Level Labels [140.68096218667162]
Temporal action localization presents a trade-off between test performance and annotation-time cost.
We introduce a new segment-level supervision setting: segments are labeled when annotators observe actions happening here.
We devise a partial segment loss regarded as a loss sampling to learn integral action parts from labeled segments.
arXiv Detail & Related papers (2020-07-03T10:32:19Z) - Self-supervised Video Object Segmentation [76.83567326586162]
The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a.k.a. dense tracking)
We make the following contributions: (i) we propose to improve the existing self-supervised approach, with a simple, yet more effective memory mechanism for long-term correspondence matching; (ii) by augmenting the self-supervised approach with an online adaptation module, our method successfully alleviates tracker drifts caused by spatial-temporal discontinuity; (iv) we demonstrate state-of-the-art results among the self-supervised approaches on DAVIS-2017 and YouTube
arXiv Detail & Related papers (2020-06-22T17:55:59Z) - On Evaluating Weakly Supervised Action Segmentation Methods [79.42955857919497]
We focus on two aspects of the use and evaluation of weakly supervised action segmentation approaches.
We train each method on the Breakfast dataset 5 times and provide average and standard deviation of the results.
Our experiments show that the standard deviation over these repetitions is between 1 and 2.5% and significantly affects the comparison between different approaches.
arXiv Detail & Related papers (2020-05-19T20:30:31Z) - ZSTAD: Zero-Shot Temporal Activity Detection [107.63759089583382]
We propose a novel task setting called zero-shot temporal activity detection (ZSTAD), where activities that have never been seen in training can still be detected.
We design an end-to-end deep network based on R-C3D as the architecture for this solution.
Experiments on both the THUMOS14 and the Charades datasets show promising performance in terms of detecting unseen activities.
arXiv Detail & Related papers (2020-03-12T02:40:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.