Pixel-Wise Contrastive Distillation
- URL: http://arxiv.org/abs/2211.00218v3
- Date: Tue, 16 Apr 2024 13:22:08 GMT
- Title: Pixel-Wise Contrastive Distillation
- Authors: Junqiang Huang, Zichao Guo,
- Abstract summary: We present a pixel-level self-supervised distillation framework friendly to dense prediction tasks.
Our method, called Pixel-Wise Contrastive Distillation (PCD), distills knowledge by attracting the corresponding pixels from student's and teacher's output feature maps.
- Score: 3.274323556083613
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a simple but effective pixel-level self-supervised distillation framework friendly to dense prediction tasks. Our method, called Pixel-Wise Contrastive Distillation (PCD), distills knowledge by attracting the corresponding pixels from student's and teacher's output feature maps. PCD includes a novel design called SpatialAdaptor which ``reshapes'' a part of the teacher network while preserving the distribution of its output features. Our ablation experiments suggest that this reshaping behavior enables more informative pixel-to-pixel distillation. Moreover, we utilize a plug-in multi-head self-attention module that explicitly relates the pixels of student's feature maps to enhance the effective receptive field, leading to a more competitive student. PCD \textbf{outperforms} previous self-supervised distillation methods on various dense prediction tasks. A backbone of \mbox{ResNet-18-FPN} distilled by PCD achieves $37.4$ AP$^\text{bbox}$ and $34.0$ AP$^\text{mask}$ on COCO dataset using the detector of \mbox{Mask R-CNN}. We hope our study will inspire future research on how to pre-train a small model friendly to dense prediction tasks in a self-supervised fashion.
Related papers
- Preserving Angles Improves Feature Distillation of Foundation Models [8.572967695281054]
Preserving similarities between a compress space network and a student image model is presented.
It is shown that variety of CossNet datasets, produces accurate with greater robustness on detection benchmarks.
This provides a competitive pathway for training on general detection benchmarks.
arXiv Detail & Related papers (2024-11-22T01:48:44Z) - PromptKD: Unsupervised Prompt Distillation for Vision-Language Models [40.858721356497085]
We introduce an unsupervised domain prompt distillation framework, which aims to transfer the knowledge of a larger teacher model to a lightweight target model.
Our framework consists of two distinct stages. In the initial stage, we pre-train a large CLIP teacher model using domain (few-shot) labels.
In the subsequent stage, the stored class vectors are shared across teacher and student image encoders for calculating the predicted logits.
arXiv Detail & Related papers (2024-03-05T08:53:30Z) - Identifying Important Group of Pixels using Interactions [5.2980803808373516]
We propose a method, MoXI, that efficiently identifies a group of pixels with high prediction confidence.
The proposed method employs game-theoretic concepts, Shapley values and interactions, taking into account the effects of individual pixels.
arXiv Detail & Related papers (2024-01-08T10:06:52Z) - MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - Masked Distillation with Receptive Tokens [44.99434415373963]
Distilling from feature maps can be fairly effective for dense prediction tasks.
We introduce a learnable embedding dubbed receptive token to localize pixels of interests in the feature map.
Our method dubbed MasKD is simple and practical, and needs no priors of tasks in application.
arXiv Detail & Related papers (2022-05-29T07:32:00Z) - Knowledge Distillation via the Target-aware Transformer [83.03578375615614]
We propose a novel one-to-all spatial matching knowledge distillation approach.
Specifically, we allow each pixel of the teacher feature to be distilled to all spatial locations of the student features.
Our approach surpasses the state-of-the-art methods by a significant margin on various computer vision benchmarks.
arXiv Detail & Related papers (2022-05-22T10:26:54Z) - Aligning Logits Generatively for Principled Black-Box Knowledge Distillation [49.43567344782207]
Black-Box Knowledge Distillation (B2KD) is a formulated problem for cloud-to-edge model compression with invisible data and models hosted on the server.
We formalize a two-step workflow consisting of deprivatization and distillation.
We propose a new method Mapping-Emulation KD (MEKD) that distills a black-box cumbersome model into a lightweight one.
arXiv Detail & Related papers (2022-05-21T02:38:16Z) - Pixel Distillation: A New Knowledge Distillation Scheme for Low-Resolution Image Recognition [124.80263629921498]
We propose Pixel Distillation that extends knowledge distillation into the input level while simultaneously breaking architecture constraints.
Such a scheme can achieve flexible cost control for deployment, as it allows the system to adjust both network architecture and image quality according to the overall requirement of resources.
arXiv Detail & Related papers (2021-12-17T14:31:40Z) - Deep Structured Instance Graph for Distilling Object Detectors [82.16270736573176]
We present a simple knowledge structure to exploit and encode information inside the detection system to facilitate detector knowledge distillation.
We achieve new state-of-the-art results on the challenging COCO object detection task with diverse student-teacher pairs on both one- and two-stage detectors.
arXiv Detail & Related papers (2021-09-27T08:26:00Z) - Deep Semi-supervised Knowledge Distillation for Overlapping Cervical
Cell Instance Segmentation [54.49894381464853]
We propose to leverage both labeled and unlabeled data for instance segmentation with improved accuracy by knowledge distillation.
We propose a novel Mask-guided Mean Teacher framework with Perturbation-sensitive Sample Mining.
Experiments show that the proposed method improves the performance significantly compared with the supervised method learned from labeled data only.
arXiv Detail & Related papers (2020-07-21T13:27:09Z) - Distilling Object Detectors with Task Adaptive Regularization [97.52935611385179]
Current state-of-the-art object detectors are at the expense of high computational costs and are hard to deploy to low-end devices.
Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization.
arXiv Detail & Related papers (2020-06-23T15:58:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.