Related papers: IVAC-P2L: Leveraging Irregular Repetition Priors for Improving Video Action Counting

IVAC-P2L: Leveraging Irregular Repetition Priors for Improving Video Action Counting

URL: http://arxiv.org/abs/2403.11959v2
Date: Wed, 20 Mar 2024 11:58:23 GMT
Title: IVAC-P2L: Leveraging Irregular Repetition Priors for Improving Video Action Counting
Authors: Hang Wang, Zhi-Qi Cheng, Youtian Du, Lei Zhang,
Abstract summary: Video Action Counting (VAC) is crucial in analyzing repetitive actions in videos. Traditional methods have overlooked the complexity of action repetitions, such as interruptions and the variability in cycle duration. We introduce Irregular Video Action Counting (IVAC), which prioritizes modeling irregular repetition patterns in videos.
Score: 24.596979713593765
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video Action Counting (VAC) is crucial in analyzing sports, fitness, and everyday activities by quantifying repetitive actions in videos. However, traditional VAC methods have overlooked the complexity of action repetitions, such as interruptions and the variability in cycle duration. Our research addresses the shortfall by introducing a novel approach to VAC, called Irregular Video Action Counting (IVAC). IVAC prioritizes modeling irregular repetition patterns in videos, which we define through two primary aspects: Inter-cycle Consistency and Cycle-interval Inconsistency. Inter-cycle Consistency ensures homogeneity in the spatial-temporal representations of cycle segments, signifying action uniformity within cycles. Cycle-interval inconsistency highlights the importance of distinguishing between cycle segments and intervals based on their inherent content differences. To encapsulate these principles, we propose a new methodology that includes consistency and inconsistency modules, supported by a unique pull-push loss (P2L) mechanism. The IVAC-P2L model applies a pull loss to promote coherence among cycle segment features and a push loss to clearly distinguish features of cycle segments from interval segments. Empirical evaluations conducted on the RepCount dataset demonstrate that the IVAC-P2L model sets a new benchmark in VAC task performance. Furthermore, the model demonstrates exceptional adaptability and generalization across various video contents, outperforming existing models on two additional datasets, UCFRep and Countix, without the need for dataset-specific optimization. These results confirm the efficacy of our approach in addressing irregular repetitions in videos and pave the way for further advancements in video analysis and understanding.

Related papers

Exploiting Inter-Sample Correlation and Intra-Sample Redundancy for Partially Relevant Video Retrieval [5.849812241074385]
PRVR aims to retrieve the target video that is partially relevant to a text query. Existing methods coarsely align paired videos and text queries to construct the semantic space. We propose a novel PRVR framework to systematically exploit inter-sample correlation and intra-sample redundancy.
arXiv Detail & Related papers (2025-04-28T09:52:46Z)
Generative Regression Based Watch Time Prediction for Short-Video Recommendation [36.95095097454143]
Watch time prediction has emerged as a pivotal task in short video recommendation systems. Recent studies have attempted to address these issues by converting the continuous watch time estimation into an ordinal regression task. We propose a novel Generative Regression (GR) framework that reformulates WTP as a sequence generation task.
arXiv Detail & Related papers (2024-12-28T16:48:55Z)
Long-Sequence Recommendation Models Need Decoupled Embeddings [49.410906935283585]
We identify and characterize a neglected deficiency in existing long-sequence recommendation models. A single set of embeddings struggles with learning both attention and representation, leading to interference between these two processes. We propose the Decoupled Attention and Representation Embeddings (DARE) model, where two distinct embedding tables are learned separately to fully decouple attention and representation.
arXiv Detail & Related papers (2024-10-03T15:45:15Z)
Improving Weakly-supervised Video Instance Segmentation by Leveraging Spatio-temporal Consistency [9.115508086522887]
We introduce a weakly-supervised method called Eigen VIS that achieves competitive accuracy compared to other VIS approaches. This method is based on two key innovations: a Temporal Eigenvalue Loss (TEL) and a clip-level Quality Co-efficient (QCC) The code is available on https://github.com/farnooshar/EigenVIS.
arXiv Detail & Related papers (2024-08-29T16:05:05Z)
Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals. Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars. Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z)
CTVIS: Consistent Training for Online Video Instance Segmentation [62.957370691452844]
Discrimination of instance embeddings plays a vital role in associating instances across time for online video instance segmentation (VIS) Recent online VIS methods leverage CIs sourced from one reference frame only, which we argue is insufficient for learning highly discriminative embeddings. We propose a simple yet effective training strategy, called Consistent Training for Online VIS (CTVIS), which devotes to aligning the training and inference pipelines.
arXiv Detail & Related papers (2023-07-24T08:44:25Z)
Transform-Equivariant Consistency Learning for Temporal Sentence Grounding [66.10949751429781]
We introduce a novel Equivariant Consistency Regulation Learning framework to learn more discriminative representations for each video. Our motivation comes from that the temporal boundary of the query-guided activity should be consistently predicted. In particular, we devise a self-supervised consistency loss module to enhance the completeness and smoothness of the augmented video.
arXiv Detail & Related papers (2023-05-06T19:29:28Z)
Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization [87.47977407022492]
This paper argues that learning by contextually comparing sequence-to-sequence distinctions offers an essential inductive bias in weakly-supervised action localization. Under a differentiable dynamic programming formulation, two complementary contrastive objectives are designed, including Fine-grained Sequence Distance (FSD) contrasting and Longest Common Subsequence (LCS) contrasting. Our method achieves state-of-the-art performance on two popular benchmarks.
arXiv Detail & Related papers (2022-03-31T05:13:50Z)
Temporal Transductive Inference for Few-Shot Video Object Segmentation [27.140141181513425]
Few-shot object segmentation (FS-VOS) aims at segmenting video frames using a few labelled examples of classes not seen during initial training. Key to our approach is the use of both global and local temporal constraints. Empirically, our model outperforms state-of-the-art meta-learning approaches in terms of mean intersection over union on YouTube-VIS by 2.8%.
arXiv Detail & Related papers (2022-03-27T14:08:30Z)
Deep Explicit Duration Switching Models for Time Series [84.33678003781908]
We propose a flexible model that is capable of identifying both state- and time-dependent switching dynamics. State-dependent switching is enabled by a recurrent state-to-switch connection. An explicit duration count variable is used to improve the time-dependent switching behavior.
arXiv Detail & Related papers (2021-10-26T17:35:21Z)
Context-aware and Scale-insensitive Temporal Repetition Counting [60.40438811580856]
Temporal repetition counting aims to estimate the number of cycles of a given repetitive action. Existing deep learning methods assume repetitive actions are performed in a fixed time-scale, which is invalid for the complex repetitive actions in real life. We propose a context-aware and scale-insensitive framework to tackle the challenges in repetition counting caused by the unknown and diverse cycle-lengths.
arXiv Detail & Related papers (2020-05-18T05:49:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.