Full Resolution Repetition Counting
- URL: http://arxiv.org/abs/2305.13778v2
- Date: Wed, 24 May 2023 10:52:44 GMT
- Title: Full Resolution Repetition Counting
- Authors: Jianing Li and Bowen Chen and Zhiyong Wang and Honghai Liu
- Abstract summary: Given an untrimmed video, repetitive actions counting aims to estimate the number of repetitions of class-agnostic actions.
Down-sampling is commonly utilized in recent state-of-the-art methods, leading to ignorance of several repetitive samples.
In this paper, we attempt to understand repetitive actions from a full temporal resolution view, by combining offline feature extraction and temporal convolution networks.
- Score: 19.676724611655914
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Given an untrimmed video, repetitive actions counting aims to estimate the
number of repetitions of class-agnostic actions. To handle the various length
of videos and repetitive actions, also optimization challenges in end-to-end
video model training, down-sampling is commonly utilized in recent
state-of-the-art methods, leading to ignorance of several repetitive samples.
In this paper, we attempt to understand repetitive actions from a full temporal
resolution view, by combining offline feature extraction and temporal
convolution networks. The former step enables us to train repetition counting
network without down-sampling while preserving all repetition regardless of the
video length and action frequency, and the later network models all frames in a
flexible and dynamically expanding temporal receptive field to retrieve all
repetitions with a global aspect. We experimentally demonstrate that our method
achieves better or comparable performance in three public datasets, i.e.,
TransRAC, UCFRep and QUVA. We expect this work will encourage our community to
think about the importance of full temporal resolution.
Related papers
- Every Shot Counts: Using Exemplars for Repetition Counting in Videos [66.1933685445448]
We propose an exemplar-based approach that discovers visual correspondence of video exemplars across repetitions within target videos.
Our proposed Every Shot Counts (ESCounts) model is an attention-based encoder-decoder that encodes videos of varying lengths alongside exemplars from the same and different videos.
arXiv Detail & Related papers (2024-03-26T19:54:21Z) - TransRAC: Encoding Multi-scale Temporal Correlation with Transformers
for Repetitive Action Counting [30.541542156648894]
Existing methods focus on performing repetitive action counting in short videos.
We introduce a new large-scale repetitive action counting dataset covering a wide variety of video lengths.
With the help of fine-grained annotation of action cycles, we propose a density map regression-based method to predict the action period.
arXiv Detail & Related papers (2022-04-03T07:50:18Z) - Repetitive Activity Counting by Sight and Sound [110.36526333035907]
This paper strives for repetitive activity counting in videos.
Different from existing works, which all analyze the visual video content only, we incorporate for the first time the corresponding sound into the repetition counting process.
arXiv Detail & Related papers (2021-03-24T11:15:33Z) - Coarse-Fine Networks for Temporal Activity Detection in Videos [45.03545172714305]
We introduce 'Co-Fine Networks', a two-stream architecture which benefits from different abstractions of temporal resolution to learn better video representations for long-term motion.
We show that our method can outperform the state-of-the-arts for action detection in public datasets with a significantly reduced compute and memory footprint.
arXiv Detail & Related papers (2021-03-01T20:48:01Z) - An Efficient Recurrent Adversarial Framework for Unsupervised Real-Time
Video Enhancement [132.60976158877608]
We propose an efficient adversarial video enhancement framework that learns directly from unpaired video examples.
In particular, our framework introduces new recurrent cells that consist of interleaved local and global modules for implicit integration of spatial and temporal information.
The proposed design allows our recurrent cells to efficiently propagate-temporal-information across frames and reduces the need for high complexity networks.
arXiv Detail & Related papers (2020-12-24T00:03:29Z) - MuCAN: Multi-Correspondence Aggregation Network for Video
Super-Resolution [63.02785017714131]
Video super-resolution (VSR) aims to utilize multiple low-resolution frames to generate a high-resolution prediction for each frame.
Inter- and intra-frames are the key sources for exploiting temporal and spatial information.
We build an effective multi-correspondence aggregation network (MuCAN) for VSR.
arXiv Detail & Related papers (2020-07-23T05:41:27Z) - Counting Out Time: Class Agnostic Video Repetition Counting in the Wild [82.26003709476848]
We present an approach for estimating the period with which an action is repeated in a video.
The crux of the approach lies in constraining the period prediction module to use temporal self-similarity.
We train this model, called Repnet, with a synthetic dataset that is generated from a large unlabeled video collection.
arXiv Detail & Related papers (2020-06-27T18:00:42Z) - Context-aware and Scale-insensitive Temporal Repetition Counting [60.40438811580856]
Temporal repetition counting aims to estimate the number of cycles of a given repetitive action.
Existing deep learning methods assume repetitive actions are performed in a fixed time-scale, which is invalid for the complex repetitive actions in real life.
We propose a context-aware and scale-insensitive framework to tackle the challenges in repetition counting caused by the unknown and diverse cycle-lengths.
arXiv Detail & Related papers (2020-05-18T05:49:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.