Temporal Action Segmentation: An Analysis of Modern Techniques
- URL: http://arxiv.org/abs/2210.10352v5
- Date: Sat, 21 Oct 2023 04:57:29 GMT
- Title: Temporal Action Segmentation: An Analysis of Modern Techniques
- Authors: Guodong Ding, Fadime Sener, and Angela Yao
- Abstract summary: Temporal action segmentation (TAS) in videos aims at densely identifying video frames in minutes-long videos with multiple action classes.
Despite the rapid growth of TAS techniques in recent years, no systematic survey has been conducted in these sectors.
This survey analyzes and summarizes the most significant contributions and trends.
- Score: 43.725939095985915
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Temporal action segmentation (TAS) in videos aims at densely identifying
video frames in minutes-long videos with multiple action classes. As a
long-range video understanding task, researchers have developed an extended
collection of methods and examined their performance using various benchmarks.
Despite the rapid growth of TAS techniques in recent years, no systematic
survey has been conducted in these sectors. This survey analyzes and summarizes
the most significant contributions and trends. In particular, we first examine
the task definition, common benchmarks, types of supervision, and prevalent
evaluation measures. In addition, we systematically investigate two essential
techniques of this topic, i.e., frame representation and temporal modeling,
which have been studied extensively in the literature. We then conduct a
thorough review of existing TAS works categorized by their levels of
supervision and conclude our survey by identifying and emphasizing several
research gaps. In addition, we have curated a list of TAS resources, which is
available at https://github.com/nus-cvml/awesome-temporal-action-segmentation.
Related papers
- Temporal2Seq: A Unified Framework for Temporal Video Understanding Tasks [26.007846170517055]
We propose a single unified framework, coined as Temporal2Seq, to formulate the output of temporal video understanding tasks as a sequence of discrete tokens.
With this unified token representation, Temporal2Seq can train a generalist model within a single architecture on different video understanding tasks.
We evaluate our Temporal2Seq generalist model on the corresponding test sets of three tasks, demonstrating that Temporal2Seq can produce reasonable results on various tasks.
arXiv Detail & Related papers (2024-09-27T06:37:47Z) - Deep Learning for Video Anomaly Detection: A Review [52.74513211976795]
Video anomaly detection (VAD) aims to discover behaviors or events deviating from the normality in videos.
In the era of deep learning, a great variety of deep learning based methods are constantly emerging for the VAD task.
This review covers the spectrum of five different categories, namely, semi-supervised, weakly supervised, fully supervised, unsupervised and open-set supervised VAD.
arXiv Detail & Related papers (2024-09-09T07:31:16Z) - Segment Anything for Videos: A Systematic Survey [52.28931543292431]
The recent wave of foundation models has witnessed tremendous success in computer vision (CV) and beyond.
The segment anything model (SAM) has sparked a passion for exploring task-agnostic visual foundation models.
This work conducts a systematic review on SAM for videos in the era of foundation models.
arXiv Detail & Related papers (2024-07-31T02:24:53Z) - A Comprehensive Review of Few-shot Action Recognition [64.47305887411275]
Few-shot action recognition aims to address the high cost and impracticality of manually labeling complex and variable video data.
It requires accurately classifying human actions in videos using only a few labeled examples per class.
arXiv Detail & Related papers (2024-07-20T03:53:32Z) - A Survey on Deep Learning-based Spatio-temporal Action Detection [8.456482280676884]
STAD aims to classify the actions present in a video and localize them in space and time.
It has become a particularly active area of research in computer vision because of its explosively emerging real-world applications.
This paper provides a comprehensive review of the state-of-the-art deep learning-based methods for STAD.
arXiv Detail & Related papers (2023-08-03T08:48:14Z) - TAEC: Unsupervised Action Segmentation with Temporal-Aware Embedding and
Clustering [27.52568444236988]
We propose an unsupervised approach for learning action classes from untrimmed video sequences.
In particular, we propose a temporal embedding network that combines relative time prediction, feature reconstruction, and sequence-to-sequence learning.
Based on the identified clusters, we decode the video into coherent temporal segments that correspond to semantically meaningful action classes.
arXiv Detail & Related papers (2023-03-09T10:46:23Z) - Deep Learning-based Action Detection in Untrimmed Videos: A Survey [20.11911785578534]
Most real-world videos are lengthy and untrimmed with sparse segments of interest.
The task of temporal activity detection in untrimmed videos aims to localize the temporal boundary of actions.
This paper provides an overview of deep learning-based algorithms to tackle temporal action detection in untrimmed videos.
arXiv Detail & Related papers (2021-09-30T22:42:25Z) - A Survey on Temporal Sentence Grounding in Videos [69.13365006222251]
Temporal sentence grounding in videos(TSGV) aims to localize one target segment from an untrimmed video with respect to a given sentence query.
To the best of our knowledge, this is the first systematic survey on temporal sentence grounding.
arXiv Detail & Related papers (2021-09-16T15:01:46Z) - A Survey on Deep Learning Technique for Video Segmentation [147.0767454918527]
Video segmentation plays a critical role in a broad range of practical applications.
Deep learning based approaches have been dedicated to video segmentation and delivered compelling performance.
arXiv Detail & Related papers (2021-07-02T15:51:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.