Distill and Collect for Semi-Supervised Temporal Action Segmentation
- URL: http://arxiv.org/abs/2211.01311v2
- Date: Thu, 3 Nov 2022 17:45:26 GMT
- Title: Distill and Collect for Semi-Supervised Temporal Action Segmentation
- Authors: Sovan Biswas, Anthony Rhodes, Ramesh Manuvinakurike, Giuseppe Raffa,
Richard Beckwith
- Abstract summary: We propose an approach for the temporal action segmentation task that can simultaneously leverage knowledge from annotated and unannotated video sequences.
Our approach uses multi-stream distillation that repeatedly refines and finally combines their frame predictions.
Our model also predicts the action order, which is later used as a temporal constraint while estimating frames labels to counter the lack of supervision for unannotated videos.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent temporal action segmentation approaches need frame annotations during
training to be effective. These annotations are very expensive and
time-consuming to obtain. This limits their performances when only limited
annotated data is available. In contrast, we can easily collect a large corpus
of in-domain unannotated videos by scavenging through the internet. Thus, this
paper proposes an approach for the temporal action segmentation task that can
simultaneously leverage knowledge from annotated and unannotated video
sequences. Our approach uses multi-stream distillation that repeatedly refines
and finally combines their frame predictions. Our model also predicts the
action order, which is later used as a temporal constraint while estimating
frames labels to counter the lack of supervision for unannotated videos. In the
end, our evaluation of the proposed approach on two different datasets
demonstrates its capability to achieve comparable performance to the full
supervision despite limited annotation.
Related papers
- TAEC: Unsupervised Action Segmentation with Temporal-Aware Embedding and
Clustering [27.52568444236988]
We propose an unsupervised approach for learning action classes from untrimmed video sequences.
In particular, we propose a temporal embedding network that combines relative time prediction, feature reconstruction, and sequence-to-sequence learning.
Based on the identified clusters, we decode the video into coherent temporal segments that correspond to semantically meaningful action classes.
arXiv Detail & Related papers (2023-03-09T10:46:23Z) - Robust Action Segmentation from Timestamp Supervision [18.671808549019833]
Action segmentation is the task of predicting an action label for each frame of an untrimmed video.
Timestamp supervision is a promising type of weak supervision as obtaining one timestamp per action is less expensive than annotating all frames.
We show that our approach is more robust to missing annotations compared to other approaches and various baselines.
arXiv Detail & Related papers (2022-10-12T18:01:14Z) - A Generalized & Robust Framework For Timestamp Supervision in Temporal
Action Segmentation [79.436224998992]
In temporal action segmentation, Timestamp supervision requires only a handful of labelled frames per video sequence.
We propose a novel Expectation-Maximization based approach that leverages the label uncertainty of unlabelled frames.
Our proposed method produces SOTA results and even exceeds the fully-supervised setup in several metrics and datasets.
arXiv Detail & Related papers (2022-07-20T18:30:48Z) - Video Moment Retrieval from Text Queries via Single Frame Annotation [65.92224946075693]
Video moment retrieval aims at finding the start and end timestamps of a moment described by a given natural language query.
Fully supervised methods need complete temporal boundary annotations to achieve promising results.
We propose a new paradigm called "glance annotation"
arXiv Detail & Related papers (2022-04-20T11:59:17Z) - Cross-Sentence Temporal and Semantic Relations in Video Activity
Localisation [79.50868197788773]
We develop a more accurate weakly-supervised solution by introducing Cross-Sentence Relations Mining.
We explore two cross-sentence relational constraints: (1) trimmed ordering and (2) semantic consistency among sentences in a paragraph description of video activities.
Experiments on two publicly available activity localisation datasets show the advantages of our approach over the state-of-the-art weakly supervised methods.
arXiv Detail & Related papers (2021-07-23T20:04:01Z) - Temporally-Weighted Hierarchical Clustering for Unsupervised Action
Segmentation [96.67525775629444]
Action segmentation refers to inferring boundaries of semantically consistent visual concepts in videos.
We present a fully automatic and unsupervised approach for segmenting actions in a video that does not require any training.
Our proposal is an effective temporally-weighted hierarchical clustering algorithm that can group semantically consistent frames of the video.
arXiv Detail & Related papers (2021-03-20T23:30:01Z) - Temporal Action Segmentation from Timestamp Supervision [25.49797678477498]
We introduce timestamp supervision for the temporal action segmentation task.
Timestamps require a comparable annotation effort to weakly supervised approaches.
Our approach uses the model output and the annotated timestamps to generate frame-wise labels.
arXiv Detail & Related papers (2021-03-11T13:52:41Z) - Efficient video annotation with visual interpolation and frame selection
guidance [0.0]
We introduce a unified framework for generic video annotation with bounding boxes.
We show that our approach reduces actual measured annotation time by 50% compared to commonly used linear methods.
arXiv Detail & Related papers (2020-12-23T09:31:40Z) - Boundary-sensitive Pre-training for Temporal Localization in Videos [124.40788524169668]
We investigate model pre-training for temporal localization by introducing a novel boundary-sensitive pretext ( BSP) task.
With the synthesized boundaries, BSP can be simply conducted via classifying the boundary types.
Extensive experiments show that the proposed BSP is superior and complementary to the existing action classification based pre-training counterpart.
arXiv Detail & Related papers (2020-11-21T17:46:24Z) - Weakly Supervised Temporal Action Localization with Segment-Level Labels [140.68096218667162]
Temporal action localization presents a trade-off between test performance and annotation-time cost.
We introduce a new segment-level supervision setting: segments are labeled when annotators observe actions happening here.
We devise a partial segment loss regarded as a loss sampling to learn integral action parts from labeled segments.
arXiv Detail & Related papers (2020-07-03T10:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.