Related papers: Temporal Action Segmentation: An Analysis of Modern Techniques

Temporal Action Segmentation: An Analysis of Modern Techniques

URL: http://arxiv.org/abs/2210.10352v5
Date: Sat, 21 Oct 2023 04:57:29 GMT
Title: Temporal Action Segmentation: An Analysis of Modern Techniques
Authors: Guodong Ding, Fadime Sener, and Angela Yao
Abstract summary: Temporal action segmentation (TAS) in videos aims at densely identifying video frames in minutes-long videos with multiple action classes. Despite the rapid growth of TAS techniques in recent years, no systematic survey has been conducted in these sectors. This survey analyzes and summarizes the most significant contributions and trends.
Score: 43.725939095985915
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Temporal action segmentation (TAS) in videos aims at densely identifying video frames in minutes-long videos with multiple action classes. As a long-range video understanding task, researchers have developed an extended collection of methods and examined their performance using various benchmarks. Despite the rapid growth of TAS techniques in recent years, no systematic survey has been conducted in these sectors. This survey analyzes and summarizes the most significant contributions and trends. In particular, we first examine the task definition, common benchmarks, types of supervision, and prevalent evaluation measures. In addition, we systematically investigate two essential techniques of this topic, i.e., frame representation and temporal modeling, which have been studied extensively in the literature. We then conduct a thorough review of existing TAS works categorized by their levels of supervision and conclude our survey by identifying and emphasizing several research gaps. In addition, we have curated a list of TAS resources, which is available at https://github.com/nus-cvml/awesome-temporal-action-segmentation.

Related papers

About Time: Advances, Challenges, and Outlooks of Action Understanding [57.76390141287026]
This survey comprehensively reviews advances in uni- and multi-modal action understanding across a range of tasks. We focus on prevalent challenges, overview widely adopted datasets, and survey seminal works with an emphasis on recent advances.
arXiv Detail & Related papers (2024-11-22T18:09:27Z)
Temporal2Seq: A Unified Framework for Temporal Video Understanding Tasks [26.007846170517055]
We propose a single unified framework, coined as Temporal2Seq, to formulate the output of temporal video understanding tasks as a sequence of discrete tokens. With this unified token representation, Temporal2Seq can train a generalist model within a single architecture on different video understanding tasks. We evaluate our Temporal2Seq generalist model on the corresponding test sets of three tasks, demonstrating that Temporal2Seq can produce reasonable results on various tasks.
arXiv Detail & Related papers (2024-09-27T06:37:47Z)
Deep Learning for Video Anomaly Detection: A Review [52.74513211976795]
Video anomaly detection (VAD) aims to discover behaviors or events deviating from the normality in videos. In the era of deep learning, a great variety of deep learning based methods are constantly emerging for the VAD task. This review covers the spectrum of five different categories, namely, semi-supervised, weakly supervised, fully supervised, unsupervised and open-set supervised VAD.
arXiv Detail & Related papers (2024-09-09T07:31:16Z)
Segment Anything for Videos: A Systematic Survey [52.28931543292431]
The recent wave of foundation models has witnessed tremendous success in computer vision (CV) and beyond. The segment anything model (SAM) has sparked a passion for exploring task-agnostic visual foundation models. This work conducts a systematic review on SAM for videos in the era of foundation models.
arXiv Detail & Related papers (2024-07-31T02:24:53Z)
A Comprehensive Review of Few-shot Action Recognition [64.47305887411275]
Few-shot action recognition aims to address the high cost and impracticality of manually labeling complex and variable video data. It requires accurately classifying human actions in videos using only a few labeled examples per class. Numerous approaches have driven significant advancements in few-shot action recognition.
arXiv Detail & Related papers (2024-07-20T03:53:32Z)
A Survey on Deep Learning-based Spatio-temporal Action Detection [8.456482280676884]
STAD aims to classify the actions present in a video and localize them in space and time. It has become a particularly active area of research in computer vision because of its explosively emerging real-world applications. This paper provides a comprehensive review of the state-of-the-art deep learning-based methods for STAD.
arXiv Detail & Related papers (2023-08-03T08:48:14Z)
TAEC: Unsupervised Action Segmentation with Temporal-Aware Embedding and Clustering [27.52568444236988]
We propose an unsupervised approach for learning action classes from untrimmed video sequences. In particular, we propose a temporal embedding network that combines relative time prediction, feature reconstruction, and sequence-to-sequence learning. Based on the identified clusters, we decode the video into coherent temporal segments that correspond to semantically meaningful action classes.
arXiv Detail & Related papers (2023-03-09T10:46:23Z)
Deep Learning-based Action Detection in Untrimmed Videos: A Survey [20.11911785578534]
Most real-world videos are lengthy and untrimmed with sparse segments of interest. The task of temporal activity detection in untrimmed videos aims to localize the temporal boundary of actions. This paper provides an overview of deep learning-based algorithms to tackle temporal action detection in untrimmed videos.
arXiv Detail & Related papers (2021-09-30T22:42:25Z)
A Survey on Temporal Sentence Grounding in Videos [69.13365006222251]
Temporal sentence grounding in videos(TSGV) aims to localize one target segment from an untrimmed video with respect to a given sentence query. To the best of our knowledge, this is the first systematic survey on temporal sentence grounding.
arXiv Detail & Related papers (2021-09-16T15:01:46Z)
A Survey on Deep Learning Technique for Video Segmentation [147.0767454918527]
Video segmentation plays a critical role in a broad range of practical applications. Deep learning based approaches have been dedicated to video segmentation and delivered compelling performance.
arXiv Detail & Related papers (2021-07-02T15:51:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.