Online Knowledge Distillation for Efficient Pose Estimation
- URL: http://arxiv.org/abs/2108.02092v1
- Date: Wed, 4 Aug 2021 14:49:44 GMT
- Title: Online Knowledge Distillation for Efficient Pose Estimation
- Authors: Zheng Li, Jingwen Ye, Mingli Song, Ying Huang, Zhigeng Pan
- Abstract summary: We investigate a novel Online Knowledge Distillation framework by distilling Human Pose structure knowledge in a one-stage manner.
OKDHP trains a single multi-branch network and acquires the predicted heatmaps from each.
The pixel-wise Kullback-Leibler divergence is utilized to minimize the discrepancy between the target heatmaps and the predicted ones.
- Score: 37.81478634850458
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing state-of-the-art human pose estimation methods require heavy
computational resources for accurate predictions. One promising technique to
obtain an accurate yet lightweight pose estimator is knowledge distillation,
which distills the pose knowledge from a powerful teacher model to a
less-parameterized student model. However, existing pose distillation works
rely on a heavy pre-trained estimator to perform knowledge transfer and require
a complex two-stage learning procedure. In this work, we investigate a novel
Online Knowledge Distillation framework by distilling Human Pose structure
knowledge in a one-stage manner to guarantee the distillation efficiency,
termed OKDHP. Specifically, OKDHP trains a single multi-branch network and
acquires the predicted heatmaps from each, which are then assembled by a
Feature Aggregation Unit (FAU) as the target heatmaps to teach each branch in
reverse. Instead of simply averaging the heatmaps, FAU which consists of
multiple parallel transformations with different receptive fields, leverages
the multi-scale information, thus obtains target heatmaps with higher-quality.
Specifically, the pixel-wise Kullback-Leibler (KL) divergence is utilized to
minimize the discrepancy between the target heatmaps and the predicted ones,
which enables the student network to learn the implicit keypoint relationship.
Besides, an unbalanced OKDHP scheme is introduced to customize the student
networks with different compression rates. The effectiveness of our approach is
demonstrated by extensive experiments on two common benchmark datasets, MPII
and COCO.
Related papers
- Learning Lightweight Object Detectors via Multi-Teacher Progressive
Distillation [56.053397775016755]
We propose a sequential approach to knowledge distillation that progressively transfers the knowledge of a set of teacher detectors to a given lightweight student.
To the best of our knowledge, we are the first to successfully distill knowledge from Transformer-based teacher detectors to convolution-based students.
arXiv Detail & Related papers (2023-08-17T17:17:08Z) - AICSD: Adaptive Inter-Class Similarity Distillation for Semantic
Segmentation [12.92102548320001]
This paper proposes a novel method called Inter-Class Similarity Distillation (ICSD) for the purpose of knowledge distillation.
The proposed method transfers high-order relations from the teacher network to the student network by independently computing intra-class distributions for each class from network outputs.
Experiments conducted on two well-known datasets for semantic segmentation, Cityscapes and Pascal VOC 2012, validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2023-08-08T13:17:20Z) - EmbedDistill: A Geometric Knowledge Distillation for Information
Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR)
We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model.
We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z) - Knowledge Distillation for Object Detection via Rank Mimicking and
Prediction-guided Feature Imitation [34.441349114336994]
We propose Rank Mimicking (RM) and Prediction-guided Feature Imitation (PFI) for distilling one-stage detectors.
RM takes the rank of candidate boxes from teachers as a new form of knowledge to distill.
PFI attempts to correlate feature differences with prediction differences, making feature imitation directly help to improve the student's accuracy.
arXiv Detail & Related papers (2021-12-09T11:19:15Z) - Boosting Light-Weight Depth Estimation Via Knowledge Distillation [21.93879961636064]
We propose a lightweight network that can accurately estimate depth maps using minimal computing resources.
We achieve this by designing a compact model architecture that maximally reduces model complexity.
Our method achieves comparable performance to state-of-the-art methods while using only 1% of their parameters.
arXiv Detail & Related papers (2021-05-13T08:42:42Z) - Orderly Dual-Teacher Knowledge Distillation for Lightweight Human Pose
Estimation [1.0323063834827415]
We propose an orderly dual-teacher knowledge distillation (ODKD) framework, which consists of two teachers with different capabilities.
Taking dual-teacher together, an orderly learning strategy is proposed to promote knowledge absorbability.
Our proposed ODKD can improve the performance of different lightweight models by a large margin, and HRNet-W16 equipped with ODKD achieves state-of-the-art performance for lightweight human pose estimation.
arXiv Detail & Related papers (2021-04-21T08:50:36Z) - SIMPLE: SIngle-network with Mimicking and Point Learning for Bottom-up
Human Pose Estimation [81.03485688525133]
We propose a novel multi-person pose estimation framework, SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation (SIMPLE)
Specifically, in the training process, we enable SIMPLE to mimic the pose knowledge from the high-performance top-down pipeline.
Besides, SIMPLE formulates human detection and pose estimation as a unified point learning framework to complement each other in single-network.
arXiv Detail & Related papers (2021-04-06T13:12:51Z) - Differentiable Feature Aggregation Search for Knowledge Distillation [47.94874193183427]
We introduce the feature aggregation to imitate the multi-teacher distillation in the single-teacher distillation framework.
DFA is a two-stage Differentiable Feature Aggregation search method motivated by DARTS in neural architecture search.
Experimental results show that DFA outperforms existing methods on CIFAR-100 and CINIC-10 datasets.
arXiv Detail & Related papers (2020-08-02T15:42:29Z) - Knowledge Distillation Meets Self-Supervision [109.6400639148393]
Knowledge distillation involves extracting "dark knowledge" from a teacher network to guide the learning of a student network.
We show that the seemingly different self-supervision task can serve as a simple yet powerful solution.
By exploiting the similarity between those self-supervision signals as an auxiliary task, one can effectively transfer the hidden information from the teacher to the student.
arXiv Detail & Related papers (2020-06-12T12:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.