Aligning Logits Generatively for Principled Black-Box Knowledge Distillation
- URL: http://arxiv.org/abs/2205.10490v2
- Date: Sat, 30 Mar 2024 08:52:40 GMT
- Title: Aligning Logits Generatively for Principled Black-Box Knowledge Distillation
- Authors: Jing Ma, Xiang Xiang, Ke Wang, Yuchuan Wu, Yongbin Li,
- Abstract summary: Black-Box Knowledge Distillation (B2KD) is a formulated problem for cloud-to-edge model compression with invisible data and models hosted on the server.
We formalize a two-step workflow consisting of deprivatization and distillation.
We propose a new method Mapping-Emulation KD (MEKD) that distills a black-box cumbersome model into a lightweight one.
- Score: 49.43567344782207
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Black-Box Knowledge Distillation (B2KD) is a formulated problem for cloud-to-edge model compression with invisible data and models hosted on the server. B2KD faces challenges such as limited Internet exchange and edge-cloud disparity of data distributions. In this paper, we formalize a two-step workflow consisting of deprivatization and distillation, and theoretically provide a new optimization direction from logits to cell boundary different from direct logits alignment. With its guidance, we propose a new method Mapping-Emulation KD (MEKD) that distills a black-box cumbersome model into a lightweight one. Our method does not differentiate between treating soft or hard responses, and consists of: 1) deprivatization: emulating the inverse mapping of the teacher function with a generator, and 2) distillation: aligning low-dimensional logits of the teacher and student models by reducing the distance of high-dimensional image points. For different teacher-student pairs, our method yields inspiring distillation performance on various benchmarks, and outperforms the previous state-of-the-art approaches.
Related papers
- Multi-Level Decoupled Relational Distillation for Heterogeneous Architectures [6.231548250160585]
Multi-Level Decoupled Knowledge Distillation (MLDR-KD) improves student model performance with gains of up to 4.86% on CodeAR-100 and 2.78% on Tiny-ImageNet datasets respectively.
arXiv Detail & Related papers (2025-02-10T06:41:20Z) - Inverse Bridge Matching Distillation [69.479483488685]
Diffusion bridge models (DBMs) are a promising extension of diffusion models for applications in image-to-image translation.
We propose a novel distillation technique based on the inverse bridge matching formulation and derive the tractable objective to solve it in practice.
We evaluate our approach for both conditional and unconditional types of bridge matching on a wide set of setups, including super-resolution, JPEG restoration, sketch-to-image, and other tasks.
arXiv Detail & Related papers (2025-02-03T13:56:03Z) - Dynamic Contrastive Knowledge Distillation for Efficient Image Restoration [17.27061613884289]
We propose a novel dynamic contrastive knowledge distillation (DCKD) framework for image restoration.
Specifically, we introduce dynamic contrastive regularization to perceive the student's learning state.
We also propose a distribution mapping module to extract and align the pixel-level category distribution of the teacher and student models.
arXiv Detail & Related papers (2024-12-12T05:01:17Z) - One Step Diffusion-based Super-Resolution with Time-Aware Distillation [60.262651082672235]
Diffusion-based image super-resolution (SR) methods have shown promise in reconstructing high-resolution images with fine details from low-resolution counterparts.
Recent techniques have been devised to enhance the sampling efficiency of diffusion-based SR models via knowledge distillation.
We propose a time-aware diffusion distillation method, named TAD-SR, to accomplish effective and efficient image super-resolution.
arXiv Detail & Related papers (2024-08-14T11:47:22Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - EmbedDistill: A Geometric Knowledge Distillation for Information
Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR)
We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model.
We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z) - ERNIE-Search: Bridging Cross-Encoder with Dual-Encoder via Self
On-the-fly Distillation for Dense Passage Retrieval [54.54667085792404]
We propose a novel distillation method that significantly advances cross-architecture distillation for dual-encoders.
Our method 1) introduces a self on-the-fly distillation method that can effectively distill late interaction (i.e., ColBERT) to vanilla dual-encoder, and 2) incorporates a cascade distillation process to further improve the performance with a cross-encoder teacher.
arXiv Detail & Related papers (2022-05-18T18:05:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.