Dataset Distillation for Pre-Trained Self-Supervised Vision Models
- URL: http://arxiv.org/abs/2511.16674v1
- Date: Thu, 20 Nov 2025 18:59:57 GMT
- Title: Dataset Distillation for Pre-Trained Self-Supervised Vision Models
- Authors: George Cazenavette, Antonio Torralba, Vincent Sitzmann,
- Abstract summary: dataset distillation aims to find a small set of synthetic images such that training a model on them reproduces the performance of the same model trained on a much larger dataset of real samples.<n>We introduce a method of dataset distillation for this task called Linear Gradient Matching.<n>Our method yields synthetic data that outperform all real-image baselines and, remarkably, generalize across pre-trained vision models.
- Score: 43.50190223507616
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The task of dataset distillation aims to find a small set of synthetic images such that training a model on them reproduces the performance of the same model trained on a much larger dataset of real samples. Existing distillation methods focus on synthesizing datasets that enable training randomly initialized models. In contrast, state-of-the-art vision approaches are increasingly building on large, pre-trained self-supervised models rather than training from scratch. In this paper, we investigate the problem of distilling datasets that enable us to optimally train linear probes on top of such large, pre-trained vision models. We introduce a method of dataset distillation for this task called Linear Gradient Matching that optimizes the synthetic images such that, when passed through a pre-trained feature extractor, they induce gradients in the linear classifier similar to those produced by the real data. Our method yields synthetic data that outperform all real-image baselines and, remarkably, generalize across pre-trained vision models, enabling us, for instance, to train a linear CLIP probe that performs competitively using a dataset distilled via a DINO backbone. Further, we show that our distilled datasets are exceptionally effective for fine-grained classification and provide a valuable tool for model interpretability, predicting, among other things, how similar two models' embedding spaces are under the platonic representation hypothesis or whether a model is sensitive to spurious correlations in adversarial datasets.
Related papers
- Boost Self-Supervised Dataset Distillation via Parameterization, Predefined Augmentation, and Approximation [19.552569546864913]
We propose a technique to distill images and their self-supervisedly trained representations into a distilled set.<n>This procedure effectively extracts rich information from real datasets, yielding the distilled sets with enhanced cross-architecture generalizability.<n>In particular, we introduce an innovative parameterization upon images and representations via distinct low-dimensional bases.
arXiv Detail & Related papers (2025-07-29T02:51:56Z) - Contrastive Learning-Enhanced Trajectory Matching for Small-Scale Dataset Distillation [0.7560883489000576]
We propose a novel dataset distillation method integrating contrastive learning during image synthesis.<n>Our approach produces more informative and diverse synthetic samples, even when dataset sizes are significantly constrained.
arXiv Detail & Related papers (2025-05-21T08:46:29Z) - Dataset Distillation with Probabilistic Latent Features [9.318549327568695]
A compact set of synthetic data can effectively replace the original dataset in downstream classification tasks.<n>We propose a novel approach that models the joint distribution of latent features.<n>Our method achieves state-of-the-art cross architecture performance across a range of backbone architectures.
arXiv Detail & Related papers (2025-05-10T13:53:49Z) - Golden Ratio Weighting Prevents Model Collapse [7.512957145774808]
We develop an optimal training strategy for integrating real and synthetic data.<n>We characterize the impact of the mixing proportion and weighting scheme of synthetic data on the final model's performance.<n>In some cases, the optimal weight assigned to real data corresponds to the reciprocal of the golden ratio.
arXiv Detail & Related papers (2025-02-25T10:15:16Z) - Self-Supervised Dataset Distillation for Transfer Learning [77.4714995131992]
We propose a novel problem of distilling an unlabeled dataset into a set of small synthetic samples for efficient self-supervised learning (SSL)
We first prove that a gradient of synthetic samples with respect to a SSL objective in naive bilevel optimization is textitbiased due to randomness originating from data augmentations or masking.
We empirically validate the effectiveness of our method on various applications involving transfer learning.
arXiv Detail & Related papers (2023-10-10T10:48:52Z) - On the Stability of Iterative Retraining of Generative Models on their own Data [56.153542044045224]
We study the impact of training generative models on mixed datasets.
We first prove the stability of iterative training under the condition that the initial generative models approximate the data distribution well enough.
We empirically validate our theory on both synthetic and natural images by iteratively training normalizing flows and state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-09-30T16:41:04Z) - The Big Data Myth: Using Diffusion Models for Dataset Generation to
Train Deep Detection Models [0.15469452301122172]
This study presents a framework for the generation of synthetic datasets by fine-tuning stable diffusion models.
The results of this study reveal that the object detection models trained on synthetic data perform similarly to the baseline model.
arXiv Detail & Related papers (2023-06-16T10:48:52Z) - Generalizing Dataset Distillation via Deep Generative Prior [75.9031209877651]
We propose to distill an entire dataset's knowledge into a few synthetic images.
The idea is to synthesize a small number of synthetic data points that, when given to a learning algorithm as training data, result in a model approximating one trained on the original data.
We present a new optimization algorithm that distills a large number of images into a few intermediate feature vectors in the generative model's latent space.
arXiv Detail & Related papers (2023-05-02T17:59:31Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z) - Dataset Distillation by Matching Training Trajectories [75.9031209877651]
We propose a new formulation that optimize our distilled data to guide networks to a similar state as those trained on real data.
Given a network, we train it for several iterations on our distilled data and optimize the distilled data with respect to the distance between the synthetically trained parameters and the parameters trained on real data.
Our method handily outperforms existing methods and also allows us to distill higher-resolution visual data.
arXiv Detail & Related papers (2022-03-22T17:58:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.