Orderly Dual-Teacher Knowledge Distillation for Lightweight Human Pose
Estimation
- URL: http://arxiv.org/abs/2104.10414v1
- Date: Wed, 21 Apr 2021 08:50:36 GMT
- Title: Orderly Dual-Teacher Knowledge Distillation for Lightweight Human Pose
Estimation
- Authors: Zhong-Qiu Zhao, Yao Gao, Yuchen Ge and Weidong Tian
- Abstract summary: We propose an orderly dual-teacher knowledge distillation (ODKD) framework, which consists of two teachers with different capabilities.
Taking dual-teacher together, an orderly learning strategy is proposed to promote knowledge absorbability.
Our proposed ODKD can improve the performance of different lightweight models by a large margin, and HRNet-W16 equipped with ODKD achieves state-of-the-art performance for lightweight human pose estimation.
- Score: 1.0323063834827415
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although deep convolution neural networks (DCNN) have achieved excellent
performance in human pose estimation, these networks often have a large number
of parameters and computations, leading to the slow inference speed. For this
issue, an effective solution is knowledge distillation, which transfers
knowledge from a large pre-trained network (teacher) to a small network
(student). However, there are some defects in the existing approaches: (I) Only
a single teacher is adopted, neglecting the potential that a student can learn
from multiple teachers. (II) The human segmentation mask can be regarded as
additional prior information to restrict the location of keypoints, which is
never utilized. (III) A student with a small number of parameters cannot fully
imitate heatmaps provided by datasets and teachers. (IV) There exists noise in
heatmaps generated by teachers, which causes model degradation. To overcome
these defects, we propose an orderly dual-teacher knowledge distillation (ODKD)
framework, which consists of two teachers with different capabilities.
Specifically, the weaker one (primary teacher, PT) is used to teach keypoints
information, the stronger one (senior teacher, ST) is utilized to transfer
segmentation and keypoints information by adding the human segmentation mask.
Taking dual-teacher together, an orderly learning strategy is proposed to
promote knowledge absorbability. Moreover, we employ a binarization operation
which further improves the learning ability of the student and reduces noise in
heatmaps. Experimental results on COCO and OCHuman keypoints datasets show that
our proposed ODKD can improve the performance of different lightweight models
by a large margin, and HRNet-W16 equipped with ODKD achieves state-of-the-art
performance for lightweight human pose estimation.
Related papers
- Relative Difficulty Distillation for Semantic Segmentation [54.76143187709987]
We propose a pixel-level KD paradigm for semantic segmentation named Relative Difficulty Distillation (RDD)
RDD allows the teacher network to provide effective guidance on learning focus without additional optimization goals.
Our research showcases that RDD can integrate with existing KD methods to improve their upper performance bound.
arXiv Detail & Related papers (2024-07-04T08:08:25Z) - RdimKD: Generic Distillation Paradigm by Dimensionality Reduction [16.977144350795488]
Knowledge Distillation (KD) emerges as one of the most promising compression technologies to run advanced deep neural networks on resource-limited devices.
In this work, we proposed an abstract and general paradigm for the KD task, referred to as DIMensionality Reduction KD (RdimKD)
RdimKD solely relies on dimensionality reduction, with a very minor modification to naive L2 loss.
arXiv Detail & Related papers (2023-12-14T07:34:08Z) - Effective Whole-body Pose Estimation with Two-stages Distillation [52.92064408970796]
Whole-body pose estimation localizes the human body, hand, face, and foot keypoints in an image.
We present a two-stage pose textbfDistillation for textbfWhole-body textbfPose estimators, named textbfDWPose, to improve their effectiveness and efficiency.
arXiv Detail & Related papers (2023-07-29T03:49:28Z) - Cross Architecture Distillation for Face Recognition [49.55061794917994]
We develop an Adaptable Prompting Teacher network (APT) that integrates prompts into the teacher, enabling it to manage distillation-specific knowledge.
Experiments on popular face benchmarks and two large-scale verification sets demonstrate the superiority of our method.
arXiv Detail & Related papers (2023-06-26T12:54:28Z) - Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge
Distillation [70.92135839545314]
We propose the dynamic prior knowledge (DPK), which integrates part of teacher's features as the prior knowledge before the feature distillation.
Our DPK makes the performance of the student model positively correlated with that of the teacher model, which means that we can further boost the accuracy of students by applying larger teachers.
arXiv Detail & Related papers (2022-06-13T11:52:13Z) - Knowledge Distillation with Deep Supervision [6.8080936803807734]
We propose Deeply-Supervised Knowledge Distillation (DSKD), which fully utilizes class predictions and feature maps of the teacher model to supervise the training of shallow student layers.
A loss-based weight allocation strategy is developed in DSKD to adaptively balance the learning process of each shallow layer, so as to further improve the student performance.
arXiv Detail & Related papers (2022-02-16T03:58:21Z) - Online Knowledge Distillation for Efficient Pose Estimation [37.81478634850458]
We investigate a novel Online Knowledge Distillation framework by distilling Human Pose structure knowledge in a one-stage manner.
OKDHP trains a single multi-branch network and acquires the predicted heatmaps from each.
The pixel-wise Kullback-Leibler divergence is utilized to minimize the discrepancy between the target heatmaps and the predicted ones.
arXiv Detail & Related papers (2021-08-04T14:49:44Z) - Boosting Light-Weight Depth Estimation Via Knowledge Distillation [21.93879961636064]
We propose a lightweight network that can accurately estimate depth maps using minimal computing resources.
We achieve this by designing a compact model architecture that maximally reduces model complexity.
Our method achieves comparable performance to state-of-the-art methods while using only 1% of their parameters.
arXiv Detail & Related papers (2021-05-13T08:42:42Z) - Wasserstein Contrastive Representation Distillation [114.24609306495456]
We propose Wasserstein Contrastive Representation Distillation (WCoRD), which leverages both primal and dual forms of Wasserstein distance for knowledge distillation.
The dual form is used for global knowledge transfer, yielding a contrastive learning objective that maximizes the lower bound of mutual information between the teacher and the student networks.
Experiments demonstrate that the proposed WCoRD method outperforms state-of-the-art approaches on privileged information distillation, model compression and cross-modal transfer.
arXiv Detail & Related papers (2020-12-15T23:43:28Z) - Heterogeneous Knowledge Distillation using Information Flow Modeling [82.83891707250926]
We propose a novel KD method that works by modeling the information flow through the various layers of the teacher model.
The proposed method is capable of overcoming the aforementioned limitations by using an appropriate supervision scheme during the different phases of the training process.
arXiv Detail & Related papers (2020-05-02T06:56:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.