Temporal Action Segmentation: An Analysis of Modern Techniques
- URL: http://arxiv.org/abs/2210.10352v5
- Date: Sat, 21 Oct 2023 04:57:29 GMT
- Title: Temporal Action Segmentation: An Analysis of Modern Techniques
- Authors: Guodong Ding, Fadime Sener, and Angela Yao
- Abstract summary: Temporal action segmentation (TAS) in videos aims at densely identifying video frames in minutes-long videos with multiple action classes.
Despite the rapid growth of TAS techniques in recent years, no systematic survey has been conducted in these sectors.
This survey analyzes and summarizes the most significant contributions and trends.
- Score: 43.725939095985915
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Temporal action segmentation (TAS) in videos aims at densely identifying
video frames in minutes-long videos with multiple action classes. As a
long-range video understanding task, researchers have developed an extended
collection of methods and examined their performance using various benchmarks.
Despite the rapid growth of TAS techniques in recent years, no systematic
survey has been conducted in these sectors. This survey analyzes and summarizes
the most significant contributions and trends. In particular, we first examine
the task definition, common benchmarks, types of supervision, and prevalent
evaluation measures. In addition, we systematically investigate two essential
techniques of this topic, i.e., frame representation and temporal modeling,
which have been studied extensively in the literature. We then conduct a
thorough review of existing TAS works categorized by their levels of
supervision and conclude our survey by identifying and emphasizing several
research gaps. In addition, we have curated a list of TAS resources, which is
available at https://github.com/nus-cvml/awesome-temporal-action-segmentation.
Related papers
- A Comprehensive Review of Few-shot Action Recognition [64.47305887411275]
Few-shot action recognition aims to address the high cost and impracticality of manually labeling complex and variable video data.
It requires accurately classifying human actions in videos using only a few labeled examples per class.
arXiv Detail & Related papers (2024-07-20T03:53:32Z) - Temporal Sentence Grounding in Streaming Videos [60.67022943824329]
This paper aims to tackle a novel task - Temporal Sentence Grounding in Streaming Videos (TSGSV)
The goal of TSGSV is to evaluate the relevance between a video stream and a given sentence query.
We propose two novel methods: (1) a TwinNet structure that enables the model to learn about upcoming events; and (2) a language-guided feature compressor that eliminates redundant visual frames.
arXiv Detail & Related papers (2023-08-14T12:30:58Z) - A Survey on Deep Learning-based Spatio-temporal Action Detection [8.456482280676884]
STAD aims to classify the actions present in a video and localize them in space and time.
It has become a particularly active area of research in computer vision because of its explosively emerging real-world applications.
This paper provides a comprehensive review of the state-of-the-art deep learning-based methods for STAD.
arXiv Detail & Related papers (2023-08-03T08:48:14Z) - TAEC: Unsupervised Action Segmentation with Temporal-Aware Embedding and
Clustering [27.52568444236988]
We propose an unsupervised approach for learning action classes from untrimmed video sequences.
In particular, we propose a temporal embedding network that combines relative time prediction, feature reconstruction, and sequence-to-sequence learning.
Based on the identified clusters, we decode the video into coherent temporal segments that correspond to semantically meaningful action classes.
arXiv Detail & Related papers (2023-03-09T10:46:23Z) - Deep Learning-based Action Detection in Untrimmed Videos: A Survey [20.11911785578534]
Most real-world videos are lengthy and untrimmed with sparse segments of interest.
The task of temporal activity detection in untrimmed videos aims to localize the temporal boundary of actions.
This paper provides an overview of deep learning-based algorithms to tackle temporal action detection in untrimmed videos.
arXiv Detail & Related papers (2021-09-30T22:42:25Z) - A Survey on Temporal Sentence Grounding in Videos [69.13365006222251]
Temporal sentence grounding in videos(TSGV) aims to localize one target segment from an untrimmed video with respect to a given sentence query.
To the best of our knowledge, this is the first systematic survey on temporal sentence grounding.
arXiv Detail & Related papers (2021-09-16T15:01:46Z) - A Survey on Deep Learning Technique for Video Segmentation [147.0767454918527]
Video segmentation plays a critical role in a broad range of practical applications.
Deep learning based approaches have been dedicated to video segmentation and delivered compelling performance.
arXiv Detail & Related papers (2021-07-02T15:51:07Z) - Few-Shot Action Localization without Knowing Boundaries [9.959844922120523]
We show that it is possible to learn to localize actions in untrimmed videos when only one/few trimmed examples of the target action are available at test time.
We propose a network that learns to estimate Temporal Similarity Matrices (TSMs) that model a fine-grained similarity pattern between pairs of videos.
Our method achieves performance comparable or better to state-of-the-art fully-supervised, few-shot learning methods.
arXiv Detail & Related papers (2021-06-08T07:32:43Z) - MS-TCN++: Multi-Stage Temporal Convolutional Network for Action
Segmentation [87.16030562892537]
We propose a multi-stage architecture for the temporal action segmentation task.
The first stage generates an initial prediction that is refined by the next ones.
Our models achieve state-of-the-art results on three datasets.
arXiv Detail & Related papers (2020-06-16T14:50:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.