FasterPose: A Faster Simple Baseline for Human Pose Estimation
- URL: http://arxiv.org/abs/2107.03215v1
- Date: Wed, 7 Jul 2021 13:39:08 GMT
- Title: FasterPose: A Faster Simple Baseline for Human Pose Estimation
- Authors: Hanbin Dai, Hailin Shi, Wu Liu, Linfang Wang, Yinglu Liu and Tao Mei
- Abstract summary: We propose a design paradigm for cost-effective network with LR representation for efficient pose estimation, named FasterPose.
We study the training behavior of FasterPose, and formulate a novel regressive cross-entropy (RCE) loss function for accelerating the convergence.
Compared with the previously dominant network of pose estimation, our method reduces 58% of the FLOPs and simultaneously gains 1.3% improvement of accuracy.
- Score: 65.8413964785972
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The performance of human pose estimation depends on the spatial accuracy of
keypoint localization. Most existing methods pursue the spatial accuracy
through learning the high-resolution (HR) representation from input images. By
the experimental analysis, we find that the HR representation leads to a sharp
increase of computational cost, while the accuracy improvement remains marginal
compared with the low-resolution (LR) representation. In this paper, we propose
a design paradigm for cost-effective network with LR representation for
efficient pose estimation, named FasterPose. Whereas the LR design largely
shrinks the model complexity, yet how to effectively train the network with
respect to the spatial accuracy is a concomitant challenge. We study the
training behavior of FasterPose, and formulate a novel regressive cross-entropy
(RCE) loss function for accelerating the convergence and promoting the
accuracy. The RCE loss generalizes the ordinary cross-entropy loss from the
binary supervision to a continuous range, thus the training of pose estimation
network is able to benefit from the sigmoid function. By doing so, the output
heatmap can be inferred from the LR features without loss of spatial accuracy,
while the computational cost and model size has been significantly reduced.
Compared with the previously dominant network of pose estimation, our method
reduces 58% of the FLOPs and simultaneously gains 1.3% improvement of accuracy.
Extensive experiments show that FasterPose yields promising results on the
common benchmarks, i.e., COCO and MPII, consistently validating the
effectiveness and efficiency for practical utilization, especially the
low-latency and low-energy-budget applications in the non-GPU scenarios.
Related papers
- Efficient Diffusion as Low Light Enhancer [63.789138528062225]
Reflectance-Aware Trajectory Refinement (RATR) is a simple yet effective module to refine the teacher trajectory using the reflectance component of images.
textbfReflectance-aware textbfDiffusion with textbfDistilled textbfTrajectory (textbfReDDiT) is an efficient and flexible distillation framework tailored for Low-Light Image Enhancement (LLIE)
arXiv Detail & Related papers (2024-10-16T08:07:18Z) - Surrogate Lagrangian Relaxation: A Path To Retrain-free Deep Neural
Network Pruning [9.33753001494221]
Network pruning is a widely used technique to reduce computation cost and model size for deep neural networks.
In this paper, we develop a systematic weight-pruning optimization approach based on Surrogate Lagrangian relaxation.
arXiv Detail & Related papers (2023-04-08T22:48:30Z) - Dynamic Iterative Refinement for Efficient 3D Hand Pose Estimation [87.54604263202941]
We propose a tiny deep neural network of which partial layers are iteratively exploited for refining its previous estimations.
We employ learned gating criteria to decide whether to exit from the weight-sharing loop, allowing per-sample adaptation in our model.
Our method consistently outperforms state-of-the-art 2D/3D hand pose estimation approaches in terms of both accuracy and efficiency for widely used benchmarks.
arXiv Detail & Related papers (2021-11-11T23:31:34Z) - Sample and Computation Redistribution for Efficient Face Detection [137.19388513633484]
Training data sampling and computation distribution strategies are the keys to efficient and accurate face detection.
scrfdf34 outperforms the best competitor, TinaFace, by $3.86%$ (AP at hard set) while being more than emph3$times$ faster on GPUs with VGA-resolution images.
arXiv Detail & Related papers (2021-05-10T23:51:14Z) - Enabling Retrain-free Deep Neural Network Pruning using Surrogate
Lagrangian Relaxation [2.691929135895278]
We develop a systematic weight-pruning optimization approach based on Surrogate Lagrangian relaxation ( SLR)
SLR achieves higher compression rate than state-of-the-arts under the same accuracy requirement.
Given a limited budget of retraining epochs, our approach quickly recovers the model accuracy.
arXiv Detail & Related papers (2020-12-18T07:17:30Z) - EfficientPose: Efficient Human Pose Estimation with Neural Architecture
Search [47.30243595690131]
We propose an efficient framework targeted at human pose estimation including two parts, the efficient backbone and the efficient head.
Our smallest model has only 0.65 GFLOPs with 88.1% PCKh@0.5 on MPII and our large model has only 2 GFLOPs while its accuracy is competitive with the state-of-the-art large model.
arXiv Detail & Related papers (2020-12-13T15:38:38Z) - Fully Quantized Image Super-Resolution Networks [81.75002888152159]
We propose a Fully Quantized image Super-Resolution framework (FQSR) to jointly optimize efficiency and accuracy.
We apply our quantization scheme on multiple mainstream super-resolution architectures, including SRResNet, SRGAN and EDSR.
Our FQSR using low bits quantization can achieve on par performance compared with the full-precision counterparts on five benchmark datasets.
arXiv Detail & Related papers (2020-11-29T03:53:49Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - EfficientPose: Scalable single-person pose estimation [3.325625311163864]
We propose a novel convolutional neural network architecture, called EfficientPose, for single-person pose estimation.
Our top-performing model achieves state-of-the-art accuracy on single-person MPII, with low-complexity ConvNets.
Due to its low complexity and efficiency, EfficientPose enables real-world applications on edge devices by limiting the memory footprint and computational cost.
arXiv Detail & Related papers (2020-04-25T16:50:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.