A Comprehensive Study of Deep Video Action Recognition
- URL: http://arxiv.org/abs/2012.06567v1
- Date: Fri, 11 Dec 2020 18:54:08 GMT
- Title: A Comprehensive Study of Deep Video Action Recognition
- Authors: Yi Zhu, Xinyu Li, Chunhui Liu, Mohammadreza Zolfaghari, Yuanjun Xiong,
Chongruo Wu, Zhi Zhang, Joseph Tighe, R. Manmatha, Mu Li
- Abstract summary: Video action recognition is one of the representative tasks for video understanding.
We provide a comprehensive survey of over 200 existing papers on deep learning for video action recognition.
- Score: 35.7068977497202
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video action recognition is one of the representative tasks for video
understanding. Over the last decade, we have witnessed great advancements in
video action recognition thanks to the emergence of deep learning. But we also
encountered new challenges, including modeling long-range temporal information
in videos, high computation costs, and incomparable results due to datasets and
evaluation protocol variances. In this paper, we provide a comprehensive survey
of over 200 existing papers on deep learning for video action recognition. We
first introduce the 17 video action recognition datasets that influenced the
design of models. Then we present video action recognition models in
chronological order: starting with early attempts at adapting deep learning,
then to the two-stream networks, followed by the adoption of 3D convolutional
kernels, and finally to the recent compute-efficient models. In addition, we
benchmark popular methods on several representative datasets and release code
for reproducibility. In the end, we discuss open problems and shed light on
opportunities for video action recognition to facilitate new research ideas.
Related papers
- A Comprehensive Review of Few-shot Action Recognition [64.47305887411275]
Few-shot action recognition aims to address the high cost and impracticality of manually labeling complex and variable video data.
It requires accurately classifying human actions in videos using only a few labeled examples per class.
arXiv Detail & Related papers (2024-07-20T03:53:32Z) - A Survey on Backbones for Deep Video Action Recognition [7.3390139372713445]
Action recognition is a key technology in building interactive metaverses.
This paper reviews several action recognition methods based on deep neural networks.
We introduce these methods in three parts: 1) Two-Streams networks and their variants, which, specifically in this paper, use RGB video frame and optical flow modality as input; 2) 3D convolutional networks, which make efforts in taking advantage of RGB modality directly while extracting different motion information is no longer necessary; 3) Transformer-based methods, which introduce the model from natural language processing into computer vision and video understanding.
arXiv Detail & Related papers (2024-05-09T07:20:36Z) - Exploring Explainability in Video Action Recognition [5.7782784592048575]
Video Action Recognition and Image Classification are foundational tasks in computer vision.
Video-TCAV aims to quantify the importance of specific concepts in the decision-making process of Video Action Recognition models.
We propose a machine-assisted approach to generate spatial andtemporal concepts relevant to Video Action Recognition for testing Video-TCAV.
arXiv Detail & Related papers (2024-04-13T19:34:14Z) - Deep Neural Networks in Video Human Action Recognition: A Review [21.00217656391331]
Video behavior recognition is one of the most foundational tasks of computer vision.
Deep neural networks are built for recognizing pixel-level information such as images with RGB, RGB-D, or optical flow formats.
In our article, the performance of deep neural networks surpassed most of the techniques in the feature learning and extraction tasks.
arXiv Detail & Related papers (2023-05-25T03:54:41Z) - InternVideo: General Video Foundation Models via Generative and
Discriminative Learning [52.69422763715118]
We present general video foundation models, InternVideo, for dynamic and complex video-level understanding tasks.
InternVideo efficiently explores masked video modeling and video-language contrastive learning as the pretraining objectives.
InternVideo achieves state-of-the-art performance on 39 video datasets from extensive tasks including video action recognition/detection, video-language alignment, and open-world video applications.
arXiv Detail & Related papers (2022-12-06T18:09:49Z) - A Survey on Deep Learning Technique for Video Segmentation [147.0767454918527]
Video segmentation plays a critical role in a broad range of practical applications.
Deep learning based approaches have been dedicated to video segmentation and delivered compelling performance.
arXiv Detail & Related papers (2021-07-02T15:51:07Z) - Recognizing Actions in Videos from Unseen Viewpoints [80.6338404141284]
We show that current convolutional neural network models are unable to recognize actions from camera viewpoints not present in training data.
We introduce a new dataset for unseen view recognition and show the approaches ability to learn viewpoint invariant representations.
arXiv Detail & Related papers (2021-03-30T17:17:54Z) - Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition [86.31412529187243]
Few-shot video recognition aims at learning new actions with only very few labeled samples.
We propose a depth guided Adaptive Meta-Fusion Network for few-shot video recognition which is termed as AMeFu-Net.
arXiv Detail & Related papers (2020-10-20T03:06:20Z) - TinyVIRAT: Low-resolution Video Action Recognition [70.37277191524755]
In real-world surveillance environments, the actions in videos are captured at a wide range of resolutions.
We introduce a benchmark dataset, TinyVIRAT, which contains natural low-resolution activities.
We propose a novel method for recognizing tiny actions in videos which utilizes a progressive generative approach.
arXiv Detail & Related papers (2020-07-14T21:09:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.