A Prospective Study on Sequence-Driven Temporal Sampling and Ego-Motion
Compensation for Action Recognition in the EPIC-Kitchens Dataset
- URL: http://arxiv.org/abs/2008.11588v1
- Date: Wed, 26 Aug 2020 14:44:45 GMT
- Title: A Prospective Study on Sequence-Driven Temporal Sampling and Ego-Motion
Compensation for Action Recognition in the EPIC-Kitchens Dataset
- Authors: Alejandro L\'opez-Cifuentes, Marcos Escudero-Vi\~nolo, Jes\'us
Besc\'os
- Abstract summary: Action recognition is one of the top-challenging research fields in computer vision.
ego-motion recorded sequences have become of important relevance.
The proposed method aims to cope with it by estimating this ego-motion or camera motion.
- Score: 68.8204255655161
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Action recognition is currently one of the top-challenging research fields in
computer vision. Convolutional Neural Networks (CNNs) have significantly
boosted its performance but rely on fixed-size spatio-temporal windows of
analysis, reducing CNNs temporal receptive fields. Among action recognition
datasets, egocentric recorded sequences have become of important relevance
while entailing an additional challenge: ego-motion is unavoidably transferred
to these sequences. The proposed method aims to cope with it by estimating this
ego-motion or camera motion. The estimation is used to temporally partition
video sequences into motion-compensated temporal \textit{chunks} showing the
action under stable backgrounds and allowing for a content-driven temporal
sampling. A CNN trained in an end-to-end fashion is used to extract temporal
features from each \textit{chunk}, which are late fused. This process leads to
the extraction of features from the whole temporal range of an action,
increasing the temporal receptive field of the network.
Related papers
- TAEC: Unsupervised Action Segmentation with Temporal-Aware Embedding and
Clustering [27.52568444236988]
We propose an unsupervised approach for learning action classes from untrimmed video sequences.
In particular, we propose a temporal embedding network that combines relative time prediction, feature reconstruction, and sequence-to-sequence learning.
Based on the identified clusters, we decode the video into coherent temporal segments that correspond to semantically meaningful action classes.
arXiv Detail & Related papers (2023-03-09T10:46:23Z) - Recurrence-in-Recurrence Networks for Video Deblurring [58.49075799159015]
State-of-the-art video deblurring methods often adopt recurrent neural networks to model the temporal dependency between the frames.
In this paper, we propose recurrence-in-recurrence network architecture to cope with the limitations of short-ranged memory.
arXiv Detail & Related papers (2022-03-12T11:58:13Z) - Spatiotemporal Transformer Attention Network for 3D Voxel Level Joint
Segmentation and Motion Prediction in Point Cloud [9.570438238511073]
Motion prediction is key enabler for automated driving systems and intelligent transportation applications.
Current challenges are how to effectively combine different perception tasks into a single backbone.
We propose a novel attention network based on a transformer self-attention mechanism for joint semantic segmentation.
arXiv Detail & Related papers (2022-02-28T23:18:27Z) - Efficient Modelling Across Time of Human Actions and Interactions [92.39082696657874]
We argue that current fixed-sized-temporal kernels in 3 convolutional neural networks (CNNDs) can be improved to better deal with temporal variations in the input.
We study how we can better handle between classes of actions, by enhancing their feature differences over different layers of the architecture.
The proposed approaches are evaluated on several benchmark action recognition datasets and show competitive results.
arXiv Detail & Related papers (2021-10-05T15:39:11Z) - Continuity-Discrimination Convolutional Neural Network for Visual Object
Tracking [150.51667609413312]
This paper proposes a novel model, named Continuity-Discrimination Convolutional Neural Network (CD-CNN) for visual object tracking.
To address this problem, CD-CNN models temporal appearance continuity based on the idea of temporal slowness.
In order to alleviate inaccurate target localization and drifting, we propose a novel notion, object-centroid.
arXiv Detail & Related papers (2021-04-18T06:35:03Z) - Coarse-Fine Networks for Temporal Activity Detection in Videos [45.03545172714305]
We introduce 'Co-Fine Networks', a two-stream architecture which benefits from different abstractions of temporal resolution to learn better video representations for long-term motion.
We show that our method can outperform the state-of-the-arts for action detection in public datasets with a significantly reduced compute and memory footprint.
arXiv Detail & Related papers (2021-03-01T20:48:01Z) - Multi-Temporal Convolutions for Human Action Recognition in Videos [83.43682368129072]
We present a novel temporal-temporal convolution block that is capable of extracting at multiple resolutions.
The proposed blocks are lightweight and can be integrated into any 3D-CNN architecture.
arXiv Detail & Related papers (2020-11-08T10:40:26Z) - Multivariate Time Series Classification Using Spiking Neural Networks [7.273181759304122]
Spiking neural network has drawn attention as it enables low power consumption.
We present an encoding scheme to convert time series into sparse spatial temporal spike patterns.
A training algorithm to classify spatial temporal patterns is also proposed.
arXiv Detail & Related papers (2020-07-07T15:24:01Z) - Learn to cycle: Time-consistent feature discovery for action recognition [83.43682368129072]
Generalizing over temporal variations is a prerequisite for effective action recognition in videos.
We introduce Squeeze Re Temporal Gates (SRTG), an approach that favors temporal activations with potential variations.
We show consistent improvement when using SRTPG blocks, with only a minimal increase in the number of GFLOs.
arXiv Detail & Related papers (2020-06-15T09:36:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.