DistilPose: Tokenized Pose Regression with Heatmap Distillation
- URL: http://arxiv.org/abs/2303.02455v2
- Date: Wed, 8 Mar 2023 05:44:22 GMT
- Title: DistilPose: Tokenized Pose Regression with Heatmap Distillation
- Authors: Suhang Ye, Yingyi Zhang, Jie Hu, Liujuan Cao, Shengchuan Zhang, Lei
Shen, Jun Wang, Shouhong Ding, Rongrong Ji
- Abstract summary: We propose a novel human pose estimation framework termed DistilPose, which bridges the gaps between heatmap-based and regression-based methods.
DistilPose maximizes the transfer of knowledge from the teacher model (heatmap-based) to the student model (regression-based) through Token-distilling (TDE) and Simulated Heatmaps.
- Score: 81.21273854769765
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the field of human pose estimation, regression-based methods have been
dominated in terms of speed, while heatmap-based methods are far ahead in terms
of performance. How to take advantage of both schemes remains a challenging
problem. In this paper, we propose a novel human pose estimation framework
termed DistilPose, which bridges the gaps between heatmap-based and
regression-based methods. Specifically, DistilPose maximizes the transfer of
knowledge from the teacher model (heatmap-based) to the student model
(regression-based) through Token-distilling Encoder (TDE) and Simulated
Heatmaps. TDE aligns the feature spaces of heatmap-based and regression-based
models by introducing tokenization, while Simulated Heatmaps transfer explicit
guidance (distribution and confidence) from teacher heatmaps into student
models. Extensive experiments show that the proposed DistilPose can
significantly improve the performance of the regression-based models while
maintaining efficiency. Specifically, on the MSCOCO validation dataset,
DistilPose-S obtains 71.6% mAP with 5.36M parameter, 2.38 GFLOPs and 40.2 FPS,
which saves 12.95x, 7.16x computational cost and is 4.9x faster than its
teacher model with only 0.9 points performance drop. Furthermore, DistilPose-L
obtains 74.4% mAP on MSCOCO validation dataset, achieving a new
state-of-the-art among predominant regression-based models.
Related papers
- Denoising and Selecting Pseudo-Heatmaps for Semi-Supervised Human Pose
Estimation [38.97427474379367]
We introduce a denoising scheme to generate reliable pseudo-heatmaps as targets for learning from unlabeled data.
We select the learning targets from these pseudo-heatmaps guided by the estimated cross-student uncertainty.
Our results show that our model outperforms previous state-of-the-art semi-supervised pose estimators.
arXiv Detail & Related papers (2023-09-29T19:17:30Z) - Learning Structure-Guided Diffusion Model for 2D Human Pose Estimation [71.24808323646167]
We propose textbfDiffusionPose, a new scheme for learning keypoints heatmaps by a neural network.
During training, the keypoints are diffused to random distribution by adding noises and the diffusion model learns to recover ground-truth heatmaps from noised heatmaps.
Experiments show the prowess of our scheme with improvements of 1.6, 1.2, and 1.2 mAP on widely-used COCO, CrowdPose, and AI Challenge datasets.
arXiv Detail & Related papers (2023-06-29T16:24:32Z) - Poseur: Direct Human Pose Regression with Transformers [119.79232258661995]
We propose a direct, regression-based approach to 2D human pose estimation from single images.
Our framework is end-to-end differentiable, and naturally learns to exploit the dependencies between keypoints.
Ours is the first regression-based approach to perform favorably compared to the best heatmap-based pose estimation methods.
arXiv Detail & Related papers (2022-01-19T04:31:57Z) - Rethinking Keypoint Representations: Modeling Keypoints and Poses as
Objects for Multi-Person Human Pose Estimation [79.78017059539526]
We propose a new heatmap-free keypoint estimation method in which individual keypoints and sets of spatially related keypoints (i.e., poses) are modeled as objects within a dense single-stage anchor-based detection framework.
In experiments, we observe that KAPAO is significantly faster and more accurate than previous methods, which suffer greatly from heatmap post-processing.
Our large model, KAPAO-L, achieves an AP of 70.6 on the Microsoft COCO Keypoints validation set without test-time augmentation.
arXiv Detail & Related papers (2021-11-16T15:36:44Z) - Human Pose Regression with Residual Log-likelihood Estimation [48.30425850653223]
We propose a novel regression paradigm with Residual Log-likelihood Estimation (RLE) to capture the underlying output distribution.
RLE learns the change of the distribution instead of the unreferenced underlying distribution to facilitate the training process.
Compared to the conventional regression paradigm, regression with RLE bring 12.4 mAP improvement on MSCOCO without any test-time overhead.
arXiv Detail & Related papers (2021-07-23T15:06:31Z) - Graph-PCNN: Two Stage Human Pose Estimation with Graph Pose Refinement [54.29252286561449]
We propose a two-stage graph-based and model-agnostic framework, called Graph-PCNN.
In the first stage, heatmap regression network is applied to obtain a rough localization result, and a set of proposal keypoints, called guided points, are sampled.
In the second stage, for each guided point, different visual feature is extracted by the localization.
The relationship between guided points is explored by the graph pose refinement module to get more accurate localization results.
arXiv Detail & Related papers (2020-07-21T04:59:15Z) - A Transfer Learning approach to Heatmap Regression for Action Unit
intensity estimation [50.261472059743845]
Action Units (AUs) are geometrically-based atomic facial muscle movements.
We propose a novel AU modelling problem that consists of jointly estimating their localisation and intensity.
A Heatmap models whether an AU occurs or not at a given spatial location.
arXiv Detail & Related papers (2020-04-14T16:51:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.