FCA-RAC: First Cycle Annotated Repetitive Action Counting
- URL: http://arxiv.org/abs/2406.12178v1
- Date: Tue, 18 Jun 2024 01:12:43 GMT
- Title: FCA-RAC: First Cycle Annotated Repetitive Action Counting
- Authors: Jiada Lu, WeiWei Zhou, Xiang Qian, Dongze Lian, Yanyu Xu, Weifeng Wang, Lina Cao, Shenghua Gao,
- Abstract summary: We propose a framework called First Cycle Annotated Repetitive Action Counting (FCA-RAC)
FCA-RAC contains 4 parts: 1) a labeling technique that annotates each training video with the start and end of the first action cycle, along with the total action count.
This technique enables the model to capture the correlation between the initial action cycle and subsequent actions.
- Score: 30.253568218869237
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Repetitive action counting quantifies the frequency of specific actions performed by individuals. However, existing action-counting datasets have limited action diversity, potentially hampering model performance on unseen actions. To address this issue, we propose a framework called First Cycle Annotated Repetitive Action Counting (FCA-RAC). This framework contains 4 parts: 1) a labeling technique that annotates each training video with the start and end of the first action cycle, along with the total action count. This technique enables the model to capture the correlation between the initial action cycle and subsequent actions; 2) an adaptive sampling strategy that maximizes action information retention by adjusting to the speed of the first annotated action cycle in videos; 3) a Multi-Temporal Granularity Convolution (MTGC) module, that leverages the muli-scale first action as a kernel to convolve across the entire video. This enables the model to capture action variations at different time scales within the video; 4) a strategy called Training Knowledge Augmentation (TKA) that exploits the annotated first action cycle information from the entire dataset. This allows the network to harness shared characteristics across actions effectively, thereby enhancing model performance and generalizability to unseen actions. Experimental results demonstrate that our approach achieves superior outcomes on RepCount-A and related datasets, highlighting the efficacy of our framework in improving model performance on seen and unseen actions. Our paper makes significant contributions to the field of action counting by addressing the limitations of existing datasets and proposing novel techniques for improving model generalizability.
Related papers
- Recovering Complete Actions for Cross-dataset Skeleton Action Recognition [25.276593723734727]
We present a recover-and-resample augmentation framework based on a novel complete action prior.
By recovering complete actions and resampling from these full sequences, we can generate strong augmentations for unseen domains.
We validate our approach on a cross-dataset setting with three skeleton action datasets.
arXiv Detail & Related papers (2024-10-31T05:27:58Z) - Skim then Focus: Integrating Contextual and Fine-grained Views for Repetitive Action Counting [87.11995635760108]
Key to action counting is accurately locating each video's repetitive actions.
We propose a dual-branch network, i.e., SkimFocusNet, working in a two-step manner.
arXiv Detail & Related papers (2024-06-13T05:15:52Z) - Coherent Temporal Synthesis for Incremental Action Segmentation [42.46228728930902]
This paper presents the first exploration of video data replay techniques for incremental action segmentation.
We propose a Temporally Coherent Action model, which represents actions using a generative model instead of storing individual frames.
In a 10-task incremental setup on the Breakfast dataset, our approach achieves significant increases in accuracy for up to 22% compared to the baselines.
arXiv Detail & Related papers (2024-03-10T06:07:06Z) - Efficient Action Counting with Dynamic Queries [31.833468477101604]
We introduce a novel approach that employs an action query representation to localize repeated action cycles with linear computational complexity.
Unlike static action queries, this approach dynamically embeds video features into action queries, offering a more flexible and generalizable representation.
Our method significantly outperforms previous works, particularly in terms of long video sequences, unseen actions, and actions at various speeds.
arXiv Detail & Related papers (2024-03-03T15:43:11Z) - Diffusion Action Segmentation [63.061058214427085]
We propose a novel framework via denoising diffusion models, which shares the same inherent spirit of such iterative refinement.
In this framework, action predictions are iteratively generated from random noise with input video features as conditions.
arXiv Detail & Related papers (2023-03-31T10:53:24Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Enhancing Sequential Recommendation with Graph Contrastive Learning [64.05023449355036]
This paper proposes a novel sequential recommendation framework, namely Graph Contrastive Learning for Sequential Recommendation (GCL4SR)
GCL4SR employs a Weighted Item Transition Graph (WITG), built based on interaction sequences of all users, to provide global context information for each interaction and weaken the noise information in the sequence data.
Experiments on real-world datasets demonstrate that GCL4SR consistently outperforms state-of-the-art sequential recommendation methods.
arXiv Detail & Related papers (2022-05-30T03:53:31Z) - Unsupervised Action Segmentation with Self-supervised Feature Learning
and Co-occurrence Parsing [32.66011849112014]
temporal action segmentation is a task to classify each frame in the video with an action label.
In this work we explore a self-supervised method that operates on a corpus of unlabeled videos and predicts a likely set of temporal segments across the videos.
We develop CAP, a novel co-occurrence action parsing algorithm that can not only capture the correlation among sub-actions underlying the structure of activities, but also estimate the temporal trajectory of the sub-actions in an accurate and general way.
arXiv Detail & Related papers (2021-05-29T00:29:40Z) - Learning Salient Boundary Feature for Anchor-free Temporal Action
Localization [81.55295042558409]
Temporal action localization is an important yet challenging task in video understanding.
We propose the first purely anchor-free temporal localization method.
Our model includes (i) an end-to-end trainable basic predictor, (ii) a saliency-based refinement module, and (iii) several consistency constraints.
arXiv Detail & Related papers (2021-03-24T12:28:32Z) - Dynamic Graph Collaborative Filtering [64.87765663208927]
Dynamic recommendation is essential for recommender systems to provide real-time predictions based on sequential data.
Here we propose Dynamic Graph Collaborative Filtering (DGCF), a novel framework leveraging dynamic graphs to capture collaborative and sequential relations.
Our approach achieves higher performance when the dataset contains less action repetition, indicating the effectiveness of integrating dynamic collaborative information.
arXiv Detail & Related papers (2021-01-08T04:16:24Z) - Rolling-Unrolling LSTMs for Action Anticipation from First-Person Video [27.391434284586985]
Rolling-Unrolling LSTM is a learning architecture to anticipate actions from egocentric videos.
The proposed approach is validated on EPIC-Kitchens, EGTEA Gaze+ and ActivityNet.
arXiv Detail & Related papers (2020-05-04T14:13:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.