Robust Action Segmentation from Timestamp Supervision
- URL: http://arxiv.org/abs/2210.06501v1
- Date: Wed, 12 Oct 2022 18:01:14 GMT
- Title: Robust Action Segmentation from Timestamp Supervision
- Authors: Yaser Souri, Yazan Abu Farha, Emad Bahrami, Gianpiero Francesca,
Juergen Gall
- Abstract summary: Action segmentation is the task of predicting an action label for each frame of an untrimmed video.
Timestamp supervision is a promising type of weak supervision as obtaining one timestamp per action is less expensive than annotating all frames.
We show that our approach is more robust to missing annotations compared to other approaches and various baselines.
- Score: 18.671808549019833
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Action segmentation is the task of predicting an action label for each frame
of an untrimmed video. As obtaining annotations to train an approach for action
segmentation in a fully supervised way is expensive, various approaches have
been proposed to train action segmentation models using different forms of weak
supervision, e.g., action transcripts, action sets, or more recently
timestamps. Timestamp supervision is a promising type of weak supervision as
obtaining one timestamp per action is less expensive than annotating all
frames, but it provides more information than other forms of weak supervision.
However, previous works assume that every action instance is annotated with a
timestamp, which is a restrictive assumption since it assumes that annotators
do not miss any action. In this work, we relax this restrictive assumption and
take missing annotations for some action instances into account. We show that
our approach is more robust to missing annotations compared to other approaches
and various baselines.
Related papers
- Proposal-based Temporal Action Localization with Point-level Supervision [29.98225940694062]
Point-level supervised temporal action localization (PTAL) aims at recognizing and localizing actions in untrimmed videos.
We propose a novel method that localizes actions by generating and evaluating action proposals of flexible duration.
Experiments show that our proposed method achieves competitive or superior performance to the state-of-the-art methods.
arXiv Detail & Related papers (2023-10-09T08:27:05Z) - Distill and Collect for Semi-Supervised Temporal Action Segmentation [0.0]
We propose an approach for the temporal action segmentation task that can simultaneously leverage knowledge from annotated and unannotated video sequences.
Our approach uses multi-stream distillation that repeatedly refines and finally combines their frame predictions.
Our model also predicts the action order, which is later used as a temporal constraint while estimating frames labels to counter the lack of supervision for unannotated videos.
arXiv Detail & Related papers (2022-11-02T17:34:04Z) - A Generalized & Robust Framework For Timestamp Supervision in Temporal
Action Segmentation [79.436224998992]
In temporal action segmentation, Timestamp supervision requires only a handful of labelled frames per video sequence.
We propose a novel Expectation-Maximization based approach that leverages the label uncertainty of unlabelled frames.
Our proposed method produces SOTA results and even exceeds the fully-supervised setup in several metrics and datasets.
arXiv Detail & Related papers (2022-07-20T18:30:48Z) - Video Moment Retrieval from Text Queries via Single Frame Annotation [65.92224946075693]
Video moment retrieval aims at finding the start and end timestamps of a moment described by a given natural language query.
Fully supervised methods need complete temporal boundary annotations to achieve promising results.
We propose a new paradigm called "glance annotation"
arXiv Detail & Related papers (2022-04-20T11:59:17Z) - End-to-end Temporal Action Detection with Transformer [86.80289146697788]
Temporal action detection (TAD) aims to determine the semantic label and the boundaries of every action instance in an untrimmed video.
Here, we construct an end-to-end framework for TAD upon Transformer, termed textitTadTR.
Our method achieves state-of-the-art performance on HACS Segments and THUMOS14 and competitive performance on ActivityNet-1.3.
arXiv Detail & Related papers (2021-06-18T17:58:34Z) - Temporally-Weighted Hierarchical Clustering for Unsupervised Action
Segmentation [96.67525775629444]
Action segmentation refers to inferring boundaries of semantically consistent visual concepts in videos.
We present a fully automatic and unsupervised approach for segmenting actions in a video that does not require any training.
Our proposal is an effective temporally-weighted hierarchical clustering algorithm that can group semantically consistent frames of the video.
arXiv Detail & Related papers (2021-03-20T23:30:01Z) - Temporal Action Segmentation from Timestamp Supervision [25.49797678477498]
We introduce timestamp supervision for the temporal action segmentation task.
Timestamps require a comparable annotation effort to weakly supervised approaches.
Our approach uses the model output and the annotated timestamps to generate frame-wise labels.
arXiv Detail & Related papers (2021-03-11T13:52:41Z) - Discovering Multi-Label Actor-Action Association in a Weakly Supervised
Setting [22.86745487695168]
We propose a baseline based on multi-instance and multi-label learning.
We propose a novel approach that uses sets of actions as representation instead of modeling individual action classes.
We evaluate the proposed approach on the challenging dataset where the proposed approach outperforms the MIML baseline and is competitive to fully supervised approaches.
arXiv Detail & Related papers (2021-01-21T11:59:47Z) - Weakly Supervised Temporal Action Localization with Segment-Level Labels [140.68096218667162]
Temporal action localization presents a trade-off between test performance and annotation-time cost.
We introduce a new segment-level supervision setting: segments are labeled when annotators observe actions happening here.
We devise a partial segment loss regarded as a loss sampling to learn integral action parts from labeled segments.
arXiv Detail & Related papers (2020-07-03T10:32:19Z) - Revisiting Few-shot Activity Detection with Class Similarity Control [107.79338380065286]
We present a framework for few-shot temporal activity detection based on proposal regression.
Our model is end-to-end trainable, takes into account the frame rate differences between few-shot activities and untrimmed test videos, and can benefit from additional few-shot examples.
arXiv Detail & Related papers (2020-03-31T22:02:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.