Related papers: Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation

Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation

URL: http://arxiv.org/abs/2205.01271v1
Date: Tue, 3 May 2022 02:08:04 GMT
Title: Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation
Authors: Yihan Wang, Muyang Li, Han Cai, Wei-Ming Chen, and Song Han
Abstract summary: We study efficient architecture design for real-time multi-person pose estimation on edge. Inspired by this finding, we design LitePose, an efficient single-branch architecture for pose estimation. We introduce two simple approaches to enhance the capacity of LitePose, including Fusion Deconv Head and Large Kernel Convs.
Score: 35.765304656180355
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Pose estimation plays a critical role in human-centered vision applications. However, it is difficult to deploy state-of-the-art HRNet-based pose estimation models on resource-constrained edge devices due to the high computational cost (more than 150 GMACs per frame). In this paper, we study efficient architecture design for real-time multi-person pose estimation on edge. We reveal that HRNet's high-resolution branches are redundant for models at the low-computation region via our gradual shrinking experiments. Removing them improves both efficiency and performance. Inspired by this finding, we design LitePose, an efficient single-branch architecture for pose estimation, and introduce two simple approaches to enhance the capacity of LitePose, including Fusion Deconv Head and Large Kernel Convs. Fusion Deconv Head removes the redundancy in high-resolution branches, allowing scale-aware feature fusion with low overhead. Large Kernel Convs significantly improve the model's capacity and receptive field while maintaining a low computational cost. With only 25% computation increment, 7x7 kernels achieve +14.0 mAP better than 3x3 kernels on the CrowdPose dataset. On mobile platforms, LitePose reduces the latency by up to 5.0x without sacrificing performance, compared with prior state-of-the-art efficient pose estimation models, pushing the frontier of real-time multi-person pose estimation on edge. Our code and pre-trained models are released at https://github.com/mit-han-lab/litepose.

Related papers

X-HRNet: Towards Lightweight Human Pose Estimation with Spatially Unidimensional Self-Attention [63.64944381130373]
In particular, predominant pose estimation methods estimate human joints by 2D single-peak heatmaps. We introduce a lightweight and powerful alternative, Spatially Unidimensional Self-Attention (SUSA), to the pointwise (1x1) convolution. Our SUSA reduces the computational complexity of the pointwise (1x1) convolution by 96% without sacrificing accuracy.
arXiv Detail & Related papers (2023-10-12T05:33:25Z)
InceptionNeXt: When Inception Meets ConvNeXt [167.61042926444105]
We build a series of networks, namely IncepitonNeXt, which not only enjoy high throughputs but also maintain competitive performance. InceptionNeXt achieves 1.6x higher training throughputs than ConvNeX-T, as well as attains 0.2% top-1 accuracy improvement on ImageNet-1K.
arXiv Detail & Related papers (2023-03-29T17:59:58Z)
BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation [105.96557764248846]
We introduce BEVFusion, a generic multi-task multi-sensor fusion framework. It unifies multi-modal features in the shared bird's-eye view representation space. It achieves 1.3% higher mAP and NDS on 3D object detection and 13.6% higher mIoU on BEV map segmentation, with 1.9x lower cost.
arXiv Detail & Related papers (2022-05-26T17:59:35Z)
Rethinking Deconvolution for 2D Human Pose Estimation Light yet Accurate Model for Real-time Edge Computing [0.0]
This system was found to be very accurate and achieved a 94.5% accuracy of SOTA HRNet 256x192. Our model adopts an encoder-decoder architecture and is carefully downsized to improve its efficiency.
arXiv Detail & Related papers (2021-11-08T01:44:46Z)
EfficientPose: Efficient Human Pose Estimation with Neural Architecture Search [47.30243595690131]
We propose an efficient framework targeted at human pose estimation including two parts, the efficient backbone and the efficient head. Our smallest model has only 0.65 GFLOPs with 88.1% PCKh@0.5 on MPII and our large model has only 2 GFLOPs while its accuracy is competitive with the state-of-the-art large model.
arXiv Detail & Related papers (2020-12-13T15:38:38Z)
Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image. The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images. We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z)
EfficientHRNet: Efficient Scaling for Lightweight High-Resolution Multi-Person Pose Estimation [2.924868086534434]
We present EfficientHRNet, a family of lightweight multi-person human pose estimators that are able to perform in real-time on resource-constrained devices. The largest model is able to come within 4.4% accuracy of the current state-of-the-art, while having 1/3 the model size and 1/6 the power. Compared to the top real-time approach, EfficientHRNet increases accuracy by 22% while achieving similar FPS with 1/3 the power.
arXiv Detail & Related papers (2020-07-16T03:27:26Z)
Making DensePose fast and light [78.49552144907513]
Existing neural network models capable of solving this task are heavily parameterized. To enable Dense Pose inference on the end device with current models, one needs to support an expensive server-side infrastructure and have a stable internet connection. In this work, we target the problem of redesigning the DensePose R-CNN model's architecture so that the final network retains most of its accuracy but becomes more light-weight and fast.
arXiv Detail & Related papers (2020-06-26T19:42:20Z)
EfficientPose: Scalable single-person pose estimation [3.325625311163864]
We propose a novel convolutional neural network architecture, called EfficientPose, for single-person pose estimation. Our top-performing model achieves state-of-the-art accuracy on single-person MPII, with low-complexity ConvNets. Due to its low complexity and efficiency, EfficientPose enables real-world applications on edge devices by limiting the memory footprint and computational cost.
arXiv Detail & Related papers (2020-04-25T16:50:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.