Effective Whole-body Pose Estimation with Two-stages Distillation
- URL: http://arxiv.org/abs/2307.15880v2
- Date: Fri, 25 Aug 2023 02:46:35 GMT
- Title: Effective Whole-body Pose Estimation with Two-stages Distillation
- Authors: Zhendong Yang, Ailing Zeng, Chun Yuan, Yu Li
- Abstract summary: Whole-body pose estimation localizes the human body, hand, face, and foot keypoints in an image.
We present a two-stage pose textbfDistillation for textbfWhole-body textbfPose estimators, named textbfDWPose, to improve their effectiveness and efficiency.
- Score: 52.92064408970796
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Whole-body pose estimation localizes the human body, hand, face, and foot
keypoints in an image. This task is challenging due to multi-scale body parts,
fine-grained localization for low-resolution regions, and data scarcity.
Meanwhile, applying a highly efficient and accurate pose estimator to widely
human-centric understanding and generation tasks is urgent. In this work, we
present a two-stage pose \textbf{D}istillation for \textbf{W}hole-body
\textbf{P}ose estimators, named \textbf{DWPose}, to improve their effectiveness
and efficiency. The first-stage distillation designs a weight-decay strategy
while utilizing a teacher's intermediate feature and final logits with both
visible and invisible keypoints to supervise the student from scratch. The
second stage distills the student model itself to further improve performance.
Different from the previous self-knowledge distillation, this stage finetunes
the student's head with only 20% training time as a plug-and-play training
strategy. For data limitations, we explore the UBody dataset that contains
diverse facial expressions and hand gestures for real-life applications.
Comprehensive experiments show the superiority of our proposed simple yet
effective methods. We achieve new state-of-the-art performance on
COCO-WholeBody, significantly boosting the whole-body AP of RTMPose-l from
64.8% to 66.5%, even surpassing RTMPose-x teacher with 65.3% AP. We release a
series of models with different sizes, from tiny to large, for satisfying
various downstream tasks. Our codes and models are available at
https://github.com/IDEA-Research/DWPose.
Related papers
- RTMW: Real-Time Multi-Person 2D and 3D Whole-body Pose Estimation [9.121372333621538]
Whole-body pose estimation aims to predict fine-grained pose information for the human body.
We present RTMW (Real-Time Multi-person Whole-body pose estimation models), a series of high-performance models for 2D/3D whole-body pose estimation.
arXiv Detail & Related papers (2024-07-11T16:15:47Z) - Harnessing Diffusion Models for Visual Perception with Meta Prompts [68.78938846041767]
We propose a simple yet effective scheme to harness a diffusion model for visual perception tasks.
We introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception.
Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes.
arXiv Detail & Related papers (2023-12-22T14:40:55Z) - Denoising and Selecting Pseudo-Heatmaps for Semi-Supervised Human Pose
Estimation [38.97427474379367]
We introduce a denoising scheme to generate reliable pseudo-heatmaps as targets for learning from unlabeled data.
We select the learning targets from these pseudo-heatmaps guided by the estimated cross-student uncertainty.
Our results show that our model outperforms previous state-of-the-art semi-supervised pose estimators.
arXiv Detail & Related papers (2023-09-29T19:17:30Z) - 2D Human Pose Estimation with Explicit Anatomical Keypoints Structure
Constraints [15.124606575017621]
We present a novel 2D human pose estimation method with explicit anatomical keypoints structure constraints.
Our proposed model can be plugged in the most existing bottom-up or top-down human pose estimation methods.
Our methods perform favorably against the most existing bottom-up and top-down human pose estimation methods.
arXiv Detail & Related papers (2022-12-05T11:01:43Z) - KTN: Knowledge Transfer Network for Learning Multi-person 2D-3D
Correspondences [77.56222946832237]
We present a novel framework to detect the densepose of multiple people in an image.
The proposed method, which we refer to Knowledge Transfer Network (KTN), tackles two main problems.
It simultaneously maintains feature resolution and suppresses background pixels, and this strategy results in substantial increase in accuracy.
arXiv Detail & Related papers (2022-06-21T03:11:37Z) - Knowledge Distillation for 6D Pose Estimation by Keypoint Distribution
Alignment [77.70208382044355]
We introduce the first knowledge distillation method for 6D pose estimation.
We observe the compact student network to struggle predicting precise 2D keypoint locations.
Our experiments on several benchmarks show that our distillation method yields state-of-the-art results.
arXiv Detail & Related papers (2022-05-30T10:17:17Z) - Prediction-Guided Distillation for Dense Object Detection [7.5320132424481505]
We show that only a very small fraction of features within a ground-truth bounding box are responsible for a teacher's high detection performance.
We propose Prediction-Guided Distillation (PGD), which focuses distillation on these key predictive regions of the teacher.
Our proposed approach outperforms current state-of-the-art KD baselines on a variety of advanced one-stage detection architectures.
arXiv Detail & Related papers (2022-03-10T16:46:05Z) - Rethinking Keypoint Representations: Modeling Keypoints and Poses as
Objects for Multi-Person Human Pose Estimation [79.78017059539526]
We propose a new heatmap-free keypoint estimation method in which individual keypoints and sets of spatially related keypoints (i.e., poses) are modeled as objects within a dense single-stage anchor-based detection framework.
In experiments, we observe that KAPAO is significantly faster and more accurate than previous methods, which suffer greatly from heatmap post-processing.
Our large model, KAPAO-L, achieves an AP of 70.6 on the Microsoft COCO Keypoints validation set without test-time augmentation.
arXiv Detail & Related papers (2021-11-16T15:36:44Z) - Orderly Dual-Teacher Knowledge Distillation for Lightweight Human Pose
Estimation [1.0323063834827415]
We propose an orderly dual-teacher knowledge distillation (ODKD) framework, which consists of two teachers with different capabilities.
Taking dual-teacher together, an orderly learning strategy is proposed to promote knowledge absorbability.
Our proposed ODKD can improve the performance of different lightweight models by a large margin, and HRNet-W16 equipped with ODKD achieves state-of-the-art performance for lightweight human pose estimation.
arXiv Detail & Related papers (2021-04-21T08:50:36Z) - Diverse Knowledge Distillation for End-to-End Person Search [81.4926655119318]
Person search aims to localize and identify a specific person from a gallery of images.
Recent methods can be categorized into two groups, i.e., two-step and end-to-end approaches.
We propose a simple yet strong end-to-end network with diverse knowledge distillation to break the bottleneck.
arXiv Detail & Related papers (2020-12-21T09:04:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.