Coherent Temporal Synthesis for Incremental Action Segmentation
- URL: http://arxiv.org/abs/2403.06102v1
- Date: Sun, 10 Mar 2024 06:07:06 GMT
- Title: Coherent Temporal Synthesis for Incremental Action Segmentation
- Authors: Guodong Ding, Hans Golong and Angela Yao
- Abstract summary: This paper presents the first exploration of video data replay techniques for incremental action segmentation.
We propose a Temporally Coherent Action model, which represents actions using a generative model instead of storing individual frames.
In a 10-task incremental setup on the Breakfast dataset, our approach achieves significant increases in accuracy for up to 22% compared to the baselines.
- Score: 42.46228728930902
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data replay is a successful incremental learning technique for images. It
prevents catastrophic forgetting by keeping a reservoir of previous data,
original or synthesized, to ensure the model retains past knowledge while
adapting to novel concepts. However, its application in the video domain is
rudimentary, as it simply stores frame exemplars for action recognition. This
paper presents the first exploration of video data replay techniques for
incremental action segmentation, focusing on action temporal modeling. We
propose a Temporally Coherent Action (TCA) model, which represents actions
using a generative model instead of storing individual frames. The integration
of a conditioning variable that captures temporal coherence allows our model to
understand the evolution of action features over time. Therefore, action
segments generated by TCA for replay are diverse and temporally coherent. In a
10-task incremental setup on the Breakfast dataset, our approach achieves
significant increases in accuracy for up to 22% compared to the baselines.
Related papers
- Harnessing Temporal Causality for Advanced Temporal Action Detection [53.654457142657236]
We introduce CausalTAD, which combines causal attention and causal Mamba to achieve state-of-the-art performance on benchmarks.
We ranked 1st in the Action Recognition, Action Detection, and Audio-Based Interaction Detection tracks at the EPIC-Kitchens Challenge 2024, and 1st in the Moment Queries track at the Ego4D Challenge 2024.
arXiv Detail & Related papers (2024-07-25T06:03:02Z) - FCA-RAC: First Cycle Annotated Repetitive Action Counting [30.253568218869237]
We propose a framework called First Cycle Annotated Repetitive Action Counting (FCA-RAC)
FCA-RAC contains 4 parts: 1) a labeling technique that annotates each training video with the start and end of the first action cycle, along with the total action count.
This technique enables the model to capture the correlation between the initial action cycle and subsequent actions.
arXiv Detail & Related papers (2024-06-18T01:12:43Z) - On the Importance of Spatial Relations for Few-shot Action Recognition [109.2312001355221]
In this paper, we investigate the importance of spatial relations and propose a more accurate few-shot action recognition method.
A novel Spatial Alignment Cross Transformer (SA-CT) learns to re-adjust the spatial relations and incorporates the temporal information.
Experiments reveal that, even without using any temporal information, the performance of SA-CT is comparable to temporal based methods on 3/4 benchmarks.
arXiv Detail & Related papers (2023-08-14T12:58:02Z) - Transform-Equivariant Consistency Learning for Temporal Sentence
Grounding [66.10949751429781]
We introduce a novel Equivariant Consistency Regulation Learning framework to learn more discriminative representations for each video.
Our motivation comes from that the temporal boundary of the query-guided activity should be consistently predicted.
In particular, we devise a self-supervised consistency loss module to enhance the completeness and smoothness of the augmented video.
arXiv Detail & Related papers (2023-05-06T19:29:28Z) - Hand Guided High Resolution Feature Enhancement for Fine-Grained Atomic
Action Segmentation within Complex Human Assemblies [0.0]
We present a novel hand location guided high resolution feature enhanced model.
We also propose a simple yet effective method of deploying offline trained action recognition models for real time action segmentation.
arXiv Detail & Related papers (2022-11-24T16:19:22Z) - Self-Regulated Learning for Egocentric Video Activity Anticipation [147.9783215348252]
Self-Regulated Learning (SRL) aims to regulate the intermediate representation consecutively to produce representation that emphasizes the novel information in the frame of the current time-stamp.
SRL sharply outperforms existing state-of-the-art in most cases on two egocentric video datasets and two third-person video datasets.
arXiv Detail & Related papers (2021-11-23T03:29:18Z) - Sequence-to-Sequence Modeling for Action Identification at High Temporal
Resolution [9.902223920743872]
We introduce a new action-recognition benchmark that includes subtle short-duration actions labeled at a high temporal resolution.
We show that current state-of-the-art models based on segmentation produce noisy predictions when applied to these data.
We propose a novel approach for high-resolution action identification, inspired by speech-recognition techniques.
arXiv Detail & Related papers (2021-11-03T21:06:36Z) - Conditional Temporal Variational AutoEncoder for Action Video Prediction [66.63038712306606]
ACT-VAE predicts pose sequences for an action clips from a single input image.
When connected with a plug-and-play Pose-to-Image (P2I) network, ACT-VAE can synthesize image sequences.
arXiv Detail & Related papers (2021-08-12T10:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.