Condensed Data Expansion Using Model Inversion for Knowledge Distillation
- URL: http://arxiv.org/abs/2408.13850v2
- Date: Mon, 10 Nov 2025 05:52:04 GMT
- Title: Condensed Data Expansion Using Model Inversion for Knowledge Distillation
- Authors: Kuluhan Binici, Shivam Aggarwal, Cihan Acar, Nam Trung Pham, Karianto Leman, Gim Hee Lee, Tulika Mitra,
- Abstract summary: We propose a method that expands condensed datasets using model inversion.<n>By creating synthetic data that complements the condensed samples, we enrich the training set and better approximate the underlying data distribution.<n>Our method demonstrates significant gains in KD accuracy compared to using condensed datasets alone.
- Score: 39.800536851433776
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Condensed datasets offer a compact representation of larger datasets, but training models directly on them or using them to enhance model performance through knowledge distillation (KD) can result in suboptimal outcomes due to limited information. To address this, we propose a method that expands condensed datasets using model inversion, a technique for generating synthetic data based on the impressions of a pre-trained model on its training data. This approach is particularly well-suited for KD scenarios, as the teacher model is already pre-trained and retains knowledge of the original training data. By creating synthetic data that complements the condensed samples, we enrich the training set and better approximate the underlying data distribution, leading to improvements in student model accuracy during knowledge distillation. Our method demonstrates significant gains in KD accuracy compared to using condensed datasets alone and outperforms standard model inversion-based KD methods by up to 11.4% across various datasets and model architectures. Importantly, it remains effective even when using as few as one condensed sample per class, and can also enhance performance in few-shot scenarios where only limited real data samples are available.
Related papers
- Dataset Distillation for Pre-Trained Self-Supervised Vision Models [43.50190223507616]
dataset distillation aims to find a small set of synthetic images such that training a model on them reproduces the performance of the same model trained on a much larger dataset of real samples.<n>We introduce a method of dataset distillation for this task called Linear Gradient Matching.<n>Our method yields synthetic data that outperform all real-image baselines and, remarkably, generalize across pre-trained vision models.
arXiv Detail & Related papers (2025-11-20T18:59:57Z) - DAViD: Data-efficient and Accurate Vision Models from Synthetic Data [6.829390872619486]
We demonstrate that it is possible to train models on much smaller but high-fidelity synthetic datasets.<n>Our models require only a fraction of the cost of training and inference when compared with foundational models of similar accuracy.
arXiv Detail & Related papers (2025-07-21T08:17:41Z) - The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation [37.38634940034755]
This paper introduces DC-CoT, the first data-centric benchmark that investigates data manipulation in chain-of-thought (CoT) distillation.<n>We rigorously evaluate the impact of these data manipulations on student model performance across multiple reasoning datasets.
arXiv Detail & Related papers (2025-05-24T15:54:19Z) - De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts [32.1016787150064]
Data-Free Knowledge Distillation (DFKD) is a promising task to train high-performance small models to enhance actual deployment without relying on the original training data.
Existing methods commonly avoid relying on private data by utilizing synthetic or sampled data.
This paper proposes a novel perspective with causal inference to disentangle the student models from the impact of such shifts.
arXiv Detail & Related papers (2024-03-28T16:13:22Z) - Towards Adversarially Robust Dataset Distillation by Curvature Regularization [11.02948004359488]
dataset distillation (DD) allows datasets to be distilled to fractions of their original size while preserving the rich distributional information.<n>Recent research in this area has been focusing on improving the accuracy of models trained on distilled datasets.<n>We propose a new method that achieves this goal by incorporating curvature regularization into the distillation process with much less computational overhead than standard adversarial training.
arXiv Detail & Related papers (2024-03-15T06:31:03Z) - Importance-Aware Adaptive Dataset Distillation [53.79746115426363]
Development of deep learning models is enabled by the availability of large-scale datasets.
dataset distillation aims to synthesize a compact dataset that retains the essential information from the large original dataset.
We propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance.
arXiv Detail & Related papers (2024-01-29T03:29:39Z) - Robustness-Reinforced Knowledge Distillation with Correlation Distance
and Network Pruning [3.1423836318272773]
Knowledge distillation (KD) improves the performance of efficient and lightweight models.
Most existing KD techniques rely on Kullback-Leibler (KL) divergence.
We propose a Robustness-Reinforced Knowledge Distillation (R2KD) that leverages correlation distance and network pruning.
arXiv Detail & Related papers (2023-11-23T11:34:48Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - VanillaKD: Revisit the Power of Vanilla Knowledge Distillation from
Small Scale to Large Scale [55.97546756258374]
We show that employing stronger data augmentation techniques and using larger datasets can directly decrease the gap between vanilla KD and other meticulously designed KD variants.
Our investigation of the vanilla KD and its variants in more complex schemes, including stronger training strategies and different model capacities, demonstrates that vanilla KD is elegantly simple but astonishingly effective in large-scale scenarios.
arXiv Detail & Related papers (2023-05-25T06:50:08Z) - Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models.
We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models.
Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z) - Dataset Distillation by Matching Training Trajectories [75.9031209877651]
We propose a new formulation that optimize our distilled data to guide networks to a similar state as those trained on real data.
Given a network, we train it for several iterations on our distilled data and optimize the distilled data with respect to the distance between the synthetically trained parameters and the parameters trained on real data.
Our method handily outperforms existing methods and also allows us to distill higher-resolution visual data.
arXiv Detail & Related papers (2022-03-22T17:58:59Z) - Efficient training of lightweight neural networks using Online
Self-Acquired Knowledge Distillation [51.66271681532262]
Online Self-Acquired Knowledge Distillation (OSAKD) is proposed, aiming to improve the performance of any deep neural model in an online manner.
We utilize k-nn non-parametric density estimation technique for estimating the unknown probability distributions of the data samples in the output feature space.
arXiv Detail & Related papers (2021-08-26T14:01:04Z) - Preventing Catastrophic Forgetting and Distribution Mismatch in
Knowledge Distillation via Synthetic Data [5.064036314529226]
We propose a data-free KD framework that maintains a dynamic collection of generated samples over time.
Our experiments demonstrate that we can improve the accuracy of the student models obtained via KD when compared with state-of-the-art approaches.
arXiv Detail & Related papers (2021-08-11T08:11:08Z) - Contrastive Model Inversion for Data-Free Knowledge Distillation [60.08025054715192]
We propose Contrastive Model Inversion, where the data diversity is explicitly modeled as an optimizable objective.
Our main observation is that, under the constraint of the same amount of data, higher data diversity usually indicates stronger instance discrimination.
Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that CMI achieves significantly superior performance when the generated data are used for knowledge distillation.
arXiv Detail & Related papers (2021-05-18T15:13:00Z) - Data Impressions: Mining Deep Models to Extract Samples for Data-free
Applications [26.48630545028405]
"Data Impressions" act as proxy to the training data and can be used to realize a variety of tasks.
We show the applicability of data impressions in solving several computer vision tasks.
arXiv Detail & Related papers (2021-01-15T11:37:29Z) - MixKD: Towards Efficient Distillation of Large-scale Language Models [129.73786264834894]
We propose MixKD, a data-agnostic distillation framework, to endow the resulting model with stronger generalization ability.
We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the error and the empirical error.
Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.
arXiv Detail & Related papers (2020-11-01T18:47:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.