OnlineTAS: An Online Baseline for Temporal Action Segmentation
- URL: http://arxiv.org/abs/2411.01122v1
- Date: Sat, 02 Nov 2024 03:52:08 GMT
- Title: OnlineTAS: An Online Baseline for Temporal Action Segmentation
- Authors: Qing Zhong, Guodong Ding, Angela Yao,
- Abstract summary: This work presents the an online framework for temporal action segmentation.
At the core of the framework is an adaptive memory designed to accommodate dynamic changes in context over time.
In addition, we propose a post-processing approach to mitigate the severe over-segmentation in the online setting.
- Score: 37.78120420622088
- License:
- Abstract: Temporal context plays a significant role in temporal action segmentation. In an offline setting, the context is typically captured by the segmentation network after observing the entire sequence. However, capturing and using such context information in an online setting remains an under-explored problem. This work presents the an online framework for temporal action segmentation. At the core of the framework is an adaptive memory designed to accommodate dynamic changes in context over time, alongside a feature augmentation module that enhances the frames with the memory. In addition, we propose a post-processing approach to mitigate the severe over-segmentation in the online setting. On three common segmentation benchmarks, our approach achieves state-of-the-art performance.
Related papers
- Online Temporal Action Localization with Memory-Augmented Transformer [61.39427407758131]
We propose a memory-augmented transformer (MATR) for online temporal action localization.
MATR selectively preserves the past segment features, allowing to leverage long-term context for inference.
We also propose a novel action localization method that observes the current input segment to predict the end time of the ongoing action and accesses the memory queue to estimate the start time of the action.
arXiv Detail & Related papers (2024-08-06T04:55:33Z) - O-TALC: Steps Towards Combating Oversegmentation within Online Action Segmentation [0.48748194765816943]
We introduce two methods for improved training and inference of backbone action recognition models.
Firstly, we introduce dense sampling whilst training to facilitate training vs. inference clip matching and improve segment boundary predictions.
Secondly, we introduce an Online Temporally Aware Label Cleaning (O-TALC) strategy to explicitly reduce oversegmentation during online inference.
arXiv Detail & Related papers (2024-04-10T10:36:15Z) - Temporally Consistent Referring Video Object Segmentation with Hybrid Memory [98.80249255577304]
We propose an end-to-end R-VOS paradigm that explicitly models temporal consistency alongside the referring segmentation.
Features of frames with automatically generated high-quality reference masks are propagated to segment remaining frames.
Extensive experiments demonstrate that our approach enhances temporal consistency by a significant margin.
arXiv Detail & Related papers (2024-03-28T13:32:49Z) - TAEC: Unsupervised Action Segmentation with Temporal-Aware Embedding and
Clustering [27.52568444236988]
We propose an unsupervised approach for learning action classes from untrimmed video sequences.
In particular, we propose a temporal embedding network that combines relative time prediction, feature reconstruction, and sequence-to-sequence learning.
Based on the identified clusters, we decode the video into coherent temporal segments that correspond to semantically meaningful action classes.
arXiv Detail & Related papers (2023-03-09T10:46:23Z) - Temporal Segment Transformer for Action Segmentation [54.25103250496069]
We propose an attention based approach which we call textittemporal segment transformer, for joint segment relation modeling and denoising.
The main idea is to denoise segment representations using attention between segment and frame representations, and also use inter-segment attention to capture temporal correlations between segments.
We show that this novel architecture achieves state-of-the-art accuracy on the popular 50Salads, GTEA and Breakfast benchmarks.
arXiv Detail & Related papers (2023-02-25T13:05:57Z) - Local Memory Attention for Fast Video Semantic Segmentation [157.7618884769969]
We propose a novel neural network module that transforms an existing single-frame semantic segmentation model into a video semantic segmentation pipeline.
Our approach aggregates a rich representation of the semantic information in past frames into a memory module.
We observe an improvement in segmentation performance on Cityscapes by 1.7% and 2.1% in mIoU respectively, while increasing inference time of ERFNet by only 1.5ms.
arXiv Detail & Related papers (2021-01-05T18:57:09Z) - Hierarchical Attention Network for Action Segmentation [45.19890687786009]
The temporal segmentation of events is an essential task and a precursor for the automatic recognition of human actions in the video.
We propose a complete end-to-end supervised learning approach that can better learn relationships between actions over time.
We evaluate our system on challenging public benchmark datasets, including MERL Shopping, 50 salads, and Georgia Tech Egocentric datasets.
arXiv Detail & Related papers (2020-05-07T02:39:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.