Challenge report:VIPriors Action Recognition Challenge
- URL: http://arxiv.org/abs/2007.08180v1
- Date: Thu, 16 Jul 2020 08:40:31 GMT
- Title: Challenge report:VIPriors Action Recognition Challenge
- Authors: Zhipeng Luo, Dawei Xu, Zhiguang Zhang
- Abstract summary: Action recognition has attracted many researchers attention for its full application, but it is still challenging.
In this paper, we study previous methods and propose our method.
We use a fast but effective way to extract motion features from videos by using residual frames as input.
- Score: 14.080142383692417
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper is a brief report to our submission to the VIPriors Action
Recognition Challenge. Action recognition has attracted many researchers
attention for its full application, but it is still challenging. In this paper,
we study previous methods and propose our method. In our method, we are
primarily making improvements on the SlowFast Network and fusing with TSM to
make further breakthroughs. Also, we use a fast but effective way to extract
motion features from videos by using residual frames as input. Better motion
features can be extracted using residual frames with SlowFast, and the
residual-frame-input path is an excellent supplement for existing
RGB-frame-input models. And better performance obtained by combining 3D
convolution(SlowFast) with 2D convolution(TSM). The above experiments were all
trained from scratch on UCF101.
Related papers
- DropMAE: Masked Autoencoders with Spatial-Attention Dropout for Tracking
Tasks [76.24996889649744]
Masked autoencoder (MAE) pretraining on videos for matching-based downstream tasks, including visual object tracking (VOT) and video object segmentation (VOS)
We propose DropMAE, which adaptively performs spatial-attention dropout in the frame reconstruction to facilitate temporal correspondence learning in videos.
Our model sets new state-of-the-art performance on 8 out of 9 highly competitive video tracking and segmentation datasets.
arXiv Detail & Related papers (2023-04-02T16:40:42Z) - You Can Ground Earlier than See: An Effective and Efficient Pipeline for
Temporal Sentence Grounding in Compressed Videos [56.676761067861236]
Given an untrimmed video, temporal sentence grounding aims to locate a target moment semantically according to a sentence query.
Previous respectable works have made decent success, but they only focus on high-level visual features extracted from decoded frames.
We propose a new setting, compressed-domain TSG, which directly utilizes compressed videos rather than fully-decompressed frames as the visual input.
arXiv Detail & Related papers (2023-03-14T12:53:27Z) - Towards Frame Rate Agnostic Multi-Object Tracking [76.82407173177138]
We propose a Frame Rate Agnostic MOT framework with a Periodic training Scheme (FAPS) to tackle the FraMOT problem for the first time.
Specifically, we propose a Frame Rate Agnostic Association Module (FAAM) that infers and encodes the frame rate information.
FAPS reflects all post-processing steps in training via tracking pattern matching and fusion.
arXiv Detail & Related papers (2022-09-23T04:25:19Z) - Learning Trajectory-Aware Transformer for Video Super-Resolution [50.49396123016185]
Video super-resolution aims to restore a sequence of high-resolution (HR) frames from their low-resolution (LR) counterparts.
Existing approaches usually align and aggregate video frames from limited adjacent frames.
We propose a novel Transformer for Video Super-Resolution (TTVSR)
arXiv Detail & Related papers (2022-04-08T03:37:39Z) - Residual Frames with Efficient Pseudo-3D CNN for Human Action
Recognition [10.185425416255294]
We propose to use residual frames as an alternative "lightweight" motion representation.
We also develop a new pseudo-3D convolution module which decouples 3D convolution into 2D and 1D convolution.
arXiv Detail & Related papers (2020-08-03T17:40:17Z) - Motion Representation Using Residual Frames with 3D CNN [43.002621928500425]
We propose a fast but effective way to extract motion features from videos utilizing residual frames as the input data in 3D ConvNets.
By replacing traditional stacked RGB frames with residual ones, 35.6% and 26.6% points improvements over top-1 accuracy can be obtained.
arXiv Detail & Related papers (2020-06-21T07:35:41Z) - A Real-time Action Representation with Temporal Encoding and Deep
Compression [115.3739774920845]
We propose a new real-time convolutional architecture, called Temporal Convolutional 3D Network (T-C3D), for action representation.
T-C3D learns video action representations in a hierarchical multi-granularity manner while obtaining a high process speed.
Our method achieves clear improvements on UCF101 action recognition benchmark against state-of-the-art real-time methods by 5.4% in terms of accuracy and 2 times faster in terms of inference speed with a less than 5MB storage model.
arXiv Detail & Related papers (2020-06-17T06:30:43Z) - TapLab: A Fast Framework for Semantic Video Segmentation Tapping into
Compressed-Domain Knowledge [161.4188504786512]
Real-time semantic video segmentation is a challenging task due to the strict requirements of inference speed.
Recent approaches mainly devote great efforts to reducing the model size for high efficiency.
We propose a simple and effective framework, dubbed TapLab, to tap into resources from the compressed domain.
arXiv Detail & Related papers (2020-03-30T08:13:47Z) - Rethinking Motion Representation: Residual Frames with 3D ConvNets for
Better Action Recognition [43.002621928500425]
We propose a fast but effective way to extract motion features from videos utilizing residual frames as the input data in 3D ConvNets.
By replacing traditional stacked RGB frames with residual ones, 20.5% and 12.5% points improvements over top-1 accuracy can be achieved.
Because residual frames contain little information of object appearance, we further use a 2D convolutional network to extract appearance features.
arXiv Detail & Related papers (2020-01-16T05:49:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.