Can pre-trained models assist in dataset distillation?
- URL: http://arxiv.org/abs/2310.03295v1
- Date: Thu, 5 Oct 2023 03:51:21 GMT
- Title: Can pre-trained models assist in dataset distillation?
- Authors: Yao Lu, Xuguang Chen, Yuchen Zhang, Jianyang Gu, Tianle Zhang, Yifan
Zhang, Xiaoniu Yang, Qi Xuan, Kai Wang, Yang You
- Abstract summary: Pre-trained Models (PTMs) function as knowledge repositories, containing extensive information from the original dataset.
This naturally raises a question: Can PTMs effectively transfer knowledge to synthetic datasets, guiding DD accurately?
We systematically study different options in PTMs, including initialization parameters, model architecture, training epoch and domain knowledge.
- Score: 21.613468512330442
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dataset Distillation (DD) is a prominent technique that encapsulates
knowledge from a large-scale original dataset into a small synthetic dataset
for efficient training. Meanwhile, Pre-trained Models (PTMs) function as
knowledge repositories, containing extensive information from the original
dataset. This naturally raises a question: Can PTMs effectively transfer
knowledge to synthetic datasets, guiding DD accurately? To this end, we conduct
preliminary experiments, confirming the contribution of PTMs to DD. Afterwards,
we systematically study different options in PTMs, including initialization
parameters, model architecture, training epoch and domain knowledge, revealing
that: 1) Increasing model diversity enhances the performance of synthetic
datasets; 2) Sub-optimal models can also assist in DD and outperform
well-trained ones in certain cases; 3) Domain-specific PTMs are not mandatory
for DD, but a reasonable domain match is crucial. Finally, by selecting optimal
options, we significantly improve the cross-architecture generalization over
baseline DD methods. We hope our work will facilitate researchers to develop
better DD techniques. Our code is available at
https://github.com/yaolu-zjut/DDInterpreter.
Related papers
- Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-Training of Deep Networks [10.932880269282014]
We propose the first effective DD method for SSL pre-training.
Specifically, we train a small student model to match the representations of a larger teacher model trained with SSL.
As the KD objective has considerably lower variance than SSL, our approach can generate synthetic datasets that can successfully pre-train high-quality encoders.
arXiv Detail & Related papers (2024-10-03T00:39:25Z) - Dataset Distillation from First Principles: Integrating Core Information Extraction and Purposeful Learning [10.116674195405126]
We argue that a precise characterization of the underlying optimization problem must specify the inference task associated with the application of interest.
Our formalization reveals novel applications of DD across different modeling environments.
We present numerical results for two case studies important in contemporary settings.
arXiv Detail & Related papers (2024-09-02T18:11:15Z) - Exploring the Impact of Dataset Bias on Dataset Distillation [10.742404631413029]
We investigate the influence of dataset bias on Dataset Distillation (DD)
DD is a technique to synthesize a smaller dataset that preserves essential information from the original dataset.
Experiments demonstrate that biases present in the original dataset significantly impact the performance of the synthetic dataset.
arXiv Detail & Related papers (2024-03-24T06:10:22Z) - Importance-Aware Adaptive Dataset Distillation [53.79746115426363]
Development of deep learning models is enabled by the availability of large-scale datasets.
dataset distillation aims to synthesize a compact dataset that retains the essential information from the large original dataset.
We propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance.
arXiv Detail & Related papers (2024-01-29T03:29:39Z) - Not All Data Matters: An End-to-End Adaptive Dataset Pruning Framework
for Enhancing Model Performance and Efficiency [9.460023981858319]
We propose an end-to-end Adaptive DAtaset PRUNing framework called AdaPruner.
AdaPruner iteratively prunes redundant samples to an expected pruning ratio.
It can still significantly enhance model performance even after pruning up to 10-30% of the training data.
arXiv Detail & Related papers (2023-12-09T16:01:21Z) - Data Distillation Can Be Like Vodka: Distilling More Times For Better
Quality [78.6359306550245]
We argue that using just one synthetic subset for distillation will not yield optimal generalization performance.
PDD synthesizes multiple small sets of synthetic images, each conditioned on the previous sets, and trains the model on the cumulative union of these subsets.
Our experiments show that PDD can effectively improve the performance of existing dataset distillation methods by up to 4.3%.
arXiv Detail & Related papers (2023-10-10T20:04:44Z) - Dataset Distillation: A Comprehensive Review [76.26276286545284]
dataset distillation (DD) aims to derive a much smaller dataset containing synthetic samples, based on which the trained models yield performance comparable with those trained on the original dataset.
This paper gives a comprehensive review and summary of recent advances in DD and its application.
arXiv Detail & Related papers (2023-01-17T17:03:28Z) - Back to the Source: Diffusion-Driven Test-Time Adaptation [77.4229736436935]
Test-time adaptation harnesses test inputs to improve accuracy of a model trained on source data when tested on shifted target data.
We instead update the target data, by projecting all test inputs toward the source domain with a generative diffusion model.
arXiv Detail & Related papers (2022-07-07T17:14:10Z) - Knowledge Distillation as Efficient Pre-training: Faster Convergence,
Higher Data-efficiency, and Better Transferability [53.27240222619834]
Knowledge Distillation as Efficient Pre-training aims to efficiently transfer the learned feature representation from pre-trained models to new student models for future downstream tasks.
Our method performs comparably with supervised pre-training counterparts in 3 downstream tasks and 9 downstream datasets requiring 10x less data and 5x less pre-training time.
arXiv Detail & Related papers (2022-03-10T06:23:41Z) - Pre-Trained Models: Past, Present and Future [126.21572378910746]
Large-scale pre-trained models (PTMs) have recently achieved great success and become a milestone in the field of artificial intelligence (AI)
By storing knowledge into huge parameters and fine-tuning on specific tasks, the rich knowledge implicitly encoded in huge parameters can benefit a variety of downstream tasks.
It is now the consensus of the AI community to adopt PTMs as backbone for downstream tasks rather than learning models from scratch.
arXiv Detail & Related papers (2021-06-14T02:40:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.