Lightweight Human Pose Estimation Using Heatmap-Weighting Loss
- URL: http://arxiv.org/abs/2205.10611v1
- Date: Sat, 21 May 2022 14:26:14 GMT
- Title: Lightweight Human Pose Estimation Using Heatmap-Weighting Loss
- Authors: Shiqi Li, Xiang Xiang
- Abstract summary: We introduce an attention mechanism that utilizes original, inter-level, and intra-level information to intensify the accuracy.
We also propose a novel loss function called heatmap weighting loss, which generates weights for each pixel on the heatmap that makes the model more focused on keypoints.
- Score: 7.830376406370752
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent research on human pose estimation exploits complex structures to
improve performance on benchmark datasets, ignoring the resource overhead and
inference speed when the model is actually deployed. In this paper, we lighten
the computation cost and parameters of the deconvolution head network in
SimpleBaseline and introduce an attention mechanism that utilizes original,
inter-level, and intra-level information to intensify the accuracy.
Additionally, we propose a novel loss function called heatmap weighting loss,
which generates weights for each pixel on the heatmap that makes the model more
focused on keypoints. Experiments demonstrate our method achieves a balance
between performance, resource volume, and inference speed. Specifically, our
method can achieve 65.3 AP score on COCO test-dev, while the inference speed is
55 FPS and 18 FPS on the mobile GPU and CPU, respectively.
Related papers
- Localized Gaussians as Self-Attention Weights for Point Clouds Correspondence [92.07601770031236]
We investigate semantically meaningful patterns in the attention heads of an encoder-only Transformer architecture.
We find that fixing the attention weights not only accelerates the training process but also enhances the stability of the optimization.
arXiv Detail & Related papers (2024-09-20T07:41:47Z) - Lightweight Super-Resolution Head for Human Pose Estimation [42.51588635059534]
Heatmap-based methods have become the mainstream method for pose estimation.
However, heatmap-based approaches suffer from significant quantization errors with downscale heatmaps.
We propose SRPose to reduce the quantization error and dependence on further post-processing.
arXiv Detail & Related papers (2023-07-31T15:35:34Z) - Sample Less, Learn More: Efficient Action Recognition via Frame Feature
Restoration [59.6021678234829]
We propose a novel method to restore the intermediate features for two sparsely sampled and adjacent video frames.
With the integration of our method, the efficiency of three commonly used baselines has been improved by over 50%, with a mere 0.5% reduction in recognition accuracy.
arXiv Detail & Related papers (2023-07-27T13:52:42Z) - Memory-Efficient Graph Convolutional Networks for Object Classification
and Detection with Event Cameras [2.3311605203774395]
Graph convolutional networks (GCNs) are a promising approach for analyzing event data.
In this paper, we consider both factors together in order to achieve satisfying results and relatively low model complexity.
Our results show a 450-fold reduction in the number of parameters for the feature extraction module and a 4.5-fold reduction in the size of the data representation.
arXiv Detail & Related papers (2023-07-26T11:44:44Z) - ALIKE: Accurate and Lightweight Keypoint Detection and Descriptor
Extraction [21.994171434960734]
We present a differentiable keypoint detection module, which outputs accurate sub-pixel keypoints.
The reprojection loss is then proposed to directly optimize these sub-pixel keypoints, and the dispersity peak loss is presented for accurate keypoints regularization.
A lightweight network is designed for keypoint detection and descriptor extraction, which can run at 95 frames per second for 640x480 images on a commercial GPU.
arXiv Detail & Related papers (2021-12-06T10:10:30Z) - Rethinking Keypoint Representations: Modeling Keypoints and Poses as
Objects for Multi-Person Human Pose Estimation [79.78017059539526]
We propose a new heatmap-free keypoint estimation method in which individual keypoints and sets of spatially related keypoints (i.e., poses) are modeled as objects within a dense single-stage anchor-based detection framework.
In experiments, we observe that KAPAO is significantly faster and more accurate than previous methods, which suffer greatly from heatmap post-processing.
Our large model, KAPAO-L, achieves an AP of 70.6 on the Microsoft COCO Keypoints validation set without test-time augmentation.
arXiv Detail & Related papers (2021-11-16T15:36:44Z) - FasterPose: A Faster Simple Baseline for Human Pose Estimation [65.8413964785972]
We propose a design paradigm for cost-effective network with LR representation for efficient pose estimation, named FasterPose.
We study the training behavior of FasterPose, and formulate a novel regressive cross-entropy (RCE) loss function for accelerating the convergence.
Compared with the previously dominant network of pose estimation, our method reduces 58% of the FLOPs and simultaneously gains 1.3% improvement of accuracy.
arXiv Detail & Related papers (2021-07-07T13:39:08Z) - Displacement-Invariant Cost Computation for Efficient Stereo Matching [122.94051630000934]
Deep learning methods have dominated stereo matching leaderboards by yielding unprecedented disparity accuracy.
But their inference time is typically slow, on the order of seconds for a pair of 540p images.
We propose a emphdisplacement-invariant cost module to compute the matching costs without needing a 4D feature volume.
arXiv Detail & Related papers (2020-12-01T23:58:16Z) - Augmented Parallel-Pyramid Net for Attention Guided Pose-Estimation [90.28365183660438]
This paper proposes an augmented parallel-pyramid net with attention partial module and differentiable auto-data augmentation.
We define a new pose search space where the sequences of data augmentations are formulated as a trainable and operational CNN component.
Notably, our method achieves the top-1 accuracy on the challenging COCO keypoint benchmark and the state-of-the-art results on the MPII datasets.
arXiv Detail & Related papers (2020-03-17T03:52:17Z) - Compression of descriptor models for mobile applications [26.498907514590165]
We evaluate the computational cost, model size, and matching accuracy tradeoffs for deep neural networks.
We observe a significant redundancy in the learned weights, which we exploit through the use of depthwise separable layers.
We propose the Convolution-Depthwise-Pointwise(CDP) layer, which provides a means of interpolating between the standard and depthwise separable convolutions.
arXiv Detail & Related papers (2020-01-09T17:00:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.