Related papers: FasterPose: A Faster Simple Baseline for Human Pose Estimation

FasterPose: A Faster Simple Baseline for Human Pose Estimation

URL: http://arxiv.org/abs/2107.03215v1
Date: Wed, 7 Jul 2021 13:39:08 GMT
Title: FasterPose: A Faster Simple Baseline for Human Pose Estimation
Authors: Hanbin Dai, Hailin Shi, Wu Liu, Linfang Wang, Yinglu Liu and Tao Mei
Abstract summary: We propose a design paradigm for cost-effective network with LR representation for efficient pose estimation, named FasterPose. We study the training behavior of FasterPose, and formulate a novel regressive cross-entropy (RCE) loss function for accelerating the convergence. Compared with the previously dominant network of pose estimation, our method reduces 58% of the FLOPs and simultaneously gains 1.3% improvement of accuracy.
Score: 65.8413964785972
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The performance of human pose estimation depends on the spatial accuracy of keypoint localization. Most existing methods pursue the spatial accuracy through learning the high-resolution (HR) representation from input images. By the experimental analysis, we find that the HR representation leads to a sharp increase of computational cost, while the accuracy improvement remains marginal compared with the low-resolution (LR) representation. In this paper, we propose a design paradigm for cost-effective network with LR representation for efficient pose estimation, named FasterPose. Whereas the LR design largely shrinks the model complexity, yet how to effectively train the network with respect to the spatial accuracy is a concomitant challenge. We study the training behavior of FasterPose, and formulate a novel regressive cross-entropy (RCE) loss function for accelerating the convergence and promoting the accuracy. The RCE loss generalizes the ordinary cross-entropy loss from the binary supervision to a continuous range, thus the training of pose estimation network is able to benefit from the sigmoid function. By doing so, the output heatmap can be inferred from the LR features without loss of spatial accuracy, while the computational cost and model size has been significantly reduced. Compared with the previously dominant network of pose estimation, our method reduces 58% of the FLOPs and simultaneously gains 1.3% improvement of accuracy. Extensive experiments show that FasterPose yields promising results on the common benchmarks, i.e., COCO and MPII, consistently validating the effectiveness and efficiency for practical utilization, especially the low-latency and low-energy-budget applications in the non-GPU scenarios.

Related papers

EDFFDNet: Towards Accurate and Efficient Unsupervised Multi-Grid Image Registration [17.190325630307097]
We propose an Exponential-Decay Free-Form Deformation Network (EDFFDNet), which employs free-form deformation with an exponential-decay basis function.<n>By transforming dense interactions into sparse ones, ASMA reduces parameters and improves accuracy.<n>Experiments demonstrate that EDFFDNet reduces parameters, memory, and total runtime by 70.5%, 32.6%, and 33.7%, respectively.<n>EDFFDNet-2 further improves PSNR by 1.06 dB while maintaining lower computational costs.
arXiv Detail & Related papers (2025-09-09T12:30:51Z)
SAT-HMR: Real-Time Multi-Person 3D Mesh Estimation via Scale-Adaptive Tokens [20.716935111971384]
We propose a one-stage framework for real-time multi-person 3D human mesh estimation from a single RGB image. We introduce scale-adaptive tokens that are dynamically adjusted based on the relative scale of each individual in the image. Experiments show that our method preserves the accuracy benefits of high-resolution processing while substantially reducing computational cost.
arXiv Detail & Related papers (2024-11-29T16:34:46Z)
Efficient Diffusion as Low Light Enhancer [63.789138528062225]
Reflectance-Aware Trajectory Refinement (RATR) is a simple yet effective module to refine the teacher trajectory using the reflectance component of images. textbfReflectance-aware textbfDiffusion with textbfDistilled textbfTrajectory (textbfReDDiT) is an efficient and flexible distillation framework tailored for Low-Light Image Enhancement (LLIE)
arXiv Detail & Related papers (2024-10-16T08:07:18Z)
Surrogate Lagrangian Relaxation: A Path To Retrain-free Deep Neural Network Pruning [9.33753001494221]
Network pruning is a widely used technique to reduce computation cost and model size for deep neural networks. In this paper, we develop a systematic weight-pruning optimization approach based on Surrogate Lagrangian relaxation.
arXiv Detail & Related papers (2023-04-08T22:48:30Z)
Dynamic Iterative Refinement for Efficient 3D Hand Pose Estimation [87.54604263202941]
We propose a tiny deep neural network of which partial layers are iteratively exploited for refining its previous estimations. We employ learned gating criteria to decide whether to exit from the weight-sharing loop, allowing per-sample adaptation in our model. Our method consistently outperforms state-of-the-art 2D/3D hand pose estimation approaches in terms of both accuracy and efficiency for widely used benchmarks.
arXiv Detail & Related papers (2021-11-11T23:31:34Z)
Sample and Computation Redistribution for Efficient Face Detection [137.19388513633484]
Training data sampling and computation distribution strategies are the keys to efficient and accurate face detection. scrfdf34 outperforms the best competitor, TinaFace, by $3.86%$ (AP at hard set) while being more than emph3$times$ faster on GPUs with VGA-resolution images.
arXiv Detail & Related papers (2021-05-10T23:51:14Z)
Enabling Retrain-free Deep Neural Network Pruning using Surrogate Lagrangian Relaxation [2.691929135895278]
We develop a systematic weight-pruning optimization approach based on Surrogate Lagrangian relaxation ( SLR) SLR achieves higher compression rate than state-of-the-arts under the same accuracy requirement. Given a limited budget of retraining epochs, our approach quickly recovers the model accuracy.
arXiv Detail & Related papers (2020-12-18T07:17:30Z)
EfficientPose: Efficient Human Pose Estimation with Neural Architecture Search [47.30243595690131]
We propose an efficient framework targeted at human pose estimation including two parts, the efficient backbone and the efficient head. Our smallest model has only 0.65 GFLOPs with 88.1% PCKh@0.5 on MPII and our large model has only 2 GFLOPs while its accuracy is competitive with the state-of-the-art large model.
arXiv Detail & Related papers (2020-12-13T15:38:38Z)
Fully Quantized Image Super-Resolution Networks [81.75002888152159]
We propose a Fully Quantized image Super-Resolution framework (FQSR) to jointly optimize efficiency and accuracy. We apply our quantization scheme on multiple mainstream super-resolution architectures, including SRResNet, SRGAN and EDSR. Our FQSR using low bits quantization can achieve on par performance compared with the full-precision counterparts on five benchmark datasets.
arXiv Detail & Related papers (2020-11-29T03:53:49Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
EfficientPose: Scalable single-person pose estimation [3.325625311163864]
We propose a novel convolutional neural network architecture, called EfficientPose, for single-person pose estimation. Our top-performing model achieves state-of-the-art accuracy on single-person MPII, with low-complexity ConvNets. Due to its low complexity and efficiency, EfficientPose enables real-world applications on edge devices by limiting the memory footprint and computational cost.
arXiv Detail & Related papers (2020-04-25T16:50:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.