Counting Out Time: Class Agnostic Video Repetition Counting in the Wild
- URL: http://arxiv.org/abs/2006.15418v1
- Date: Sat, 27 Jun 2020 18:00:42 GMT
- Title: Counting Out Time: Class Agnostic Video Repetition Counting in the Wild
- Authors: Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet,
Andrew Zisserman
- Abstract summary: We present an approach for estimating the period with which an action is repeated in a video.
The crux of the approach lies in constraining the period prediction module to use temporal self-similarity.
We train this model, called Repnet, with a synthetic dataset that is generated from a large unlabeled video collection.
- Score: 82.26003709476848
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an approach for estimating the period with which an action is
repeated in a video. The crux of the approach lies in constraining the period
prediction module to use temporal self-similarity as an intermediate
representation bottleneck that allows generalization to unseen repetitions in
videos in the wild. We train this model, called Repnet, with a synthetic
dataset that is generated from a large unlabeled video collection by sampling
short clips of varying lengths and repeating them with different periods and
counts. This combination of synthetic data and a powerful yet constrained
model, allows us to predict periods in a class-agnostic fashion. Our model
substantially exceeds the state of the art performance on existing periodicity
(PERTUBE) and repetition counting (QUVA) benchmarks. We also collect a new
challenging dataset called Countix (~90 times larger than existing datasets)
which captures the challenges of repetition counting in real-world videos.
Project webpage: https://sites.google.com/view/repnet .
Related papers
- OVR: A Dataset for Open Vocabulary Temporal Repetition Counting in Videos [58.5538620720541]
The dataset, OVR, contains annotations for over 72K videos.
OVR is almost an order of magnitude larger than previous datasets for video repetition.
We propose a baseline transformer-based counting model, OVRCounter, that can count repetitions in videos up to 320 frames long.
arXiv Detail & Related papers (2024-07-24T08:22:49Z) - Every Shot Counts: Using Exemplars for Repetition Counting in Videos [66.1933685445448]
We propose an exemplar-based approach that discovers visual correspondence of video exemplars across repetitions within target videos.
Our proposed Every Shot Counts (ESCounts) model is an attention-based encoder-decoder that encodes videos of varying lengths alongside exemplars from the same and different videos.
arXiv Detail & Related papers (2024-03-26T19:54:21Z) - Full Resolution Repetition Counting [19.676724611655914]
Given an untrimmed video, repetitive actions counting aims to estimate the number of repetitions of class-agnostic actions.
Down-sampling is commonly utilized in recent state-of-the-art methods, leading to ignorance of several repetitive samples.
In this paper, we attempt to understand repetitive actions from a full temporal resolution view, by combining offline feature extraction and temporal convolution networks.
arXiv Detail & Related papers (2023-05-23T07:45:56Z) - Transform-Equivariant Consistency Learning for Temporal Sentence
Grounding [66.10949751429781]
We introduce a novel Equivariant Consistency Regulation Learning framework to learn more discriminative representations for each video.
Our motivation comes from that the temporal boundary of the query-guided activity should be consistently predicted.
In particular, we devise a self-supervised consistency loss module to enhance the completeness and smoothness of the augmented video.
arXiv Detail & Related papers (2023-05-06T19:29:28Z) - HyperTime: Implicit Neural Representation for Time Series [131.57172578210256]
Implicit neural representations (INRs) have recently emerged as a powerful tool that provides an accurate and resolution-independent encoding of data.
In this paper, we analyze the representation of time series using INRs, comparing different activation functions in terms of reconstruction accuracy and training convergence speed.
We propose a hypernetwork architecture that leverages INRs to learn a compressed latent representation of an entire time series dataset.
arXiv Detail & Related papers (2022-08-11T14:05:51Z) - Representation Recycling for Streaming Video Analysis [19.068248496174903]
StreamDEQ aims to infer frame-wise representations on videos with minimal per-frame computation.
We show that StreamDEQ is able to recover near-optimal representations in a few frames' time and maintain an up-to-date representation throughout the video duration.
arXiv Detail & Related papers (2022-04-28T13:35:14Z) - Learning from Irregularly-Sampled Time Series: A Missing Data
Perspective [18.493394650508044]
Irregularly-sampled time series occur in many domains including healthcare.
We model irregularly-sampled time series data as a sequence of index-value pairs sampled from a continuous but unobserved function.
We propose learning methods for this framework based on variational autoencoders and generative adversarial networks.
arXiv Detail & Related papers (2020-08-17T20:01:55Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.