ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search
- URL: http://arxiv.org/abs/2105.10154v1
- Date: Fri, 21 May 2021 06:36:40 GMT
- Title: ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search
- Authors: Lumin Xu, Yingda Guan, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo,
Wanli Ouyang, Xiaogang Wang
- Abstract summary: We propose a novel neural architecture search (NAS) method, termed ViPNAS, to search networks in both spatial and temporal levels for fast online video pose estimation.
In the spatial level, we carefully design the search space with five different dimensions including network depth, width, kernel size, group number, and attentions.
In the temporal level, we search from a series of temporal feature fusions to optimize the total accuracy and speed across multiple video frames.
- Score: 94.90294600817215
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human pose estimation has achieved significant progress in recent years.
However, most of the recent methods focus on improving accuracy using
complicated models and ignoring real-time efficiency. To achieve a better
trade-off between accuracy and efficiency, we propose a novel neural
architecture search (NAS) method, termed ViPNAS, to search networks in both
spatial and temporal levels for fast online video pose estimation. In the
spatial level, we carefully design the search space with five different
dimensions including network depth, width, kernel size, group number, and
attentions. In the temporal level, we search from a series of temporal feature
fusions to optimize the total accuracy and speed across multiple video frames.
To the best of our knowledge, we are the first to search for the temporal
feature fusion and automatic computation allocation in videos. Extensive
experiments demonstrate the effectiveness of our approach on the challenging
COCO2017 and PoseTrack2018 datasets. Our discovered model family, S-ViPNAS and
T-ViPNAS, achieve significantly higher inference speed (CPU real-time) without
sacrificing the accuracy compared to the previous state-of-the-art methods.
Related papers
- A Pairwise Comparison Relation-assisted Multi-objective Evolutionary Neural Architecture Search Method with Multi-population Mechanism [58.855741970337675]
Neural architecture search (NAS) enables re-searchers to automatically explore vast search spaces and find efficient neural networks.
NAS suffers from a key bottleneck, i.e., numerous architectures need to be evaluated during the search process.
We propose the SMEM-NAS, a pairwise com-parison relation-assisted multi-objective evolutionary algorithm based on a multi-population mechanism.
arXiv Detail & Related papers (2024-07-22T12:46:22Z) - Learning Temporally Consistent Video Depth from Video Diffusion Priors [57.929828486615605]
This work addresses the challenge of video depth estimation.
We reformulate the prediction task into a conditional generation problem.
This allows us to leverage the prior knowledge embedded in existing video generation models.
arXiv Detail & Related papers (2024-06-03T16:20:24Z) - POPNASv2: An Efficient Multi-Objective Neural Architecture Search
Technique [7.497722345725035]
This paper proposes a new version of the Pareto-optimal Progressive Neural Architecture Search, called POPNASv2.
Our approach enhances its first version and improves its performance.
Our efforts allow POPNASv2 to achieve PNAS-like performance with an average 4x factor search time speed-up.
arXiv Detail & Related papers (2022-10-06T14:51:54Z) - NAS-TC: Neural Architecture Search on Temporal Convolutions for Complex
Action Recognition [45.168746142597946]
We propose a new processing framework called Neural Architecture Search- Temporal Convolutional (NAS-TC)
In the first phase, the classical CNN network is used as the backbone network to complete the computationally intensive feature extraction task.
In the second stage, a simple stitching search to the cell is used to complete the relatively lightweight long-range temporal-dependent information extraction.
arXiv Detail & Related papers (2021-03-17T02:02:11Z) - Efficient Model Performance Estimation via Feature Histories [27.008927077173553]
An important step in the task of neural network design is the evaluation of a model's performance.
In this work, we use the evolution history of features of a network during the early stages of training to build a proxy classifier.
We show that our method can be combined with multiple search algorithms to find better solutions to a wide range of tasks.
arXiv Detail & Related papers (2021-03-07T20:41:57Z) - PV-NAS: Practical Neural Architecture Search for Video Recognition [83.77236063613579]
Deep neural networks for video tasks is highly customized and the design of such networks requires domain experts and costly trial and error tests.
Recent advance in network architecture search has boosted the image recognition performance in a large margin.
In this study, we propose a practical solution, namely Practical Video Neural Architecture Search (PV-NAS)
arXiv Detail & Related papers (2020-11-02T08:50:23Z) - Real-time Semantic Segmentation with Fast Attention [94.88466483540692]
We propose a novel architecture for semantic segmentation of high-resolution images and videos in real-time.
The proposed architecture relies on our fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism.
We show that results on multiple datasets demonstrate superior performance with better accuracy and speed compared to existing approaches.
arXiv Detail & Related papers (2020-07-07T22:37:16Z) - A Real-time Action Representation with Temporal Encoding and Deep
Compression [115.3739774920845]
We propose a new real-time convolutional architecture, called Temporal Convolutional 3D Network (T-C3D), for action representation.
T-C3D learns video action representations in a hierarchical multi-granularity manner while obtaining a high process speed.
Our method achieves clear improvements on UCF101 action recognition benchmark against state-of-the-art real-time methods by 5.4% in terms of accuracy and 2 times faster in terms of inference speed with a less than 5MB storage model.
arXiv Detail & Related papers (2020-06-17T06:30:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.