A Comprehensive Survey of Dataset Distillation
- URL: http://arxiv.org/abs/2301.05603v4
- Date: Sun, 24 Dec 2023 14:45:22 GMT
- Title: A Comprehensive Survey of Dataset Distillation
- Authors: Shiye Lei and Dacheng Tao
- Abstract summary: It has become challenging to handle the unlimited growth of data with limited computing power.
Deep learning technology has developed unprecedentedly in the last decade.
This paper provides a holistic understanding of dataset distillation from multiple aspects.
- Score: 73.15482472726555
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning technology has developed unprecedentedly in the last decade and
has become the primary choice in many application domains. This progress is
mainly attributed to a systematic collaboration in which rapidly growing
computing resources encourage advanced algorithms to deal with massive data.
However, it has gradually become challenging to handle the unlimited growth of
data with limited computing power. To this end, diverse approaches are proposed
to improve data processing efficiency. Dataset distillation, a dataset
reduction method, addresses this problem by synthesizing a small typical
dataset from substantial data and has attracted much attention from the deep
learning community. Existing dataset distillation methods can be taxonomized
into meta-learning and data matching frameworks according to whether they
explicitly mimic the performance of target data. Although dataset distillation
has shown surprising performance in compressing datasets, there are still
several limitations such as distilling high-resolution data or data with
complex label spaces. This paper provides a holistic understanding of dataset
distillation from multiple aspects, including distillation frameworks and
algorithms, factorized dataset distillation, performance comparison, and
applications. Finally, we discuss challenges and promising directions to
further promote future studies on dataset distillation.
Related papers
- What is Dataset Distillation Learning? [32.99890244958794]
We study the behavior, representativeness, and point-wise information content of distilled data.
We reveal distilled data cannot serve as a substitute for real data during training.
We provide an framework for interpreting distilled data and reveal that individual distilled data points contain meaningful semantic information.
arXiv Detail & Related papers (2024-06-06T17:28:56Z) - Importance-Aware Adaptive Dataset Distillation [53.79746115426363]
Development of deep learning models is enabled by the availability of large-scale datasets.
dataset distillation aims to synthesize a compact dataset that retains the essential information from the large original dataset.
We propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance.
arXiv Detail & Related papers (2024-01-29T03:29:39Z) - Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation [96.92250565207017]
We study the data efficiency and selection for the dataset distillation task.
By re-formulating the dynamics of distillation, we provide insight into the inherent redundancy in the real dataset.
We find the most contributing samples based on their causal effects on the distillation.
arXiv Detail & Related papers (2023-05-28T06:53:41Z) - Generalizing Dataset Distillation via Deep Generative Prior [75.9031209877651]
We propose to distill an entire dataset's knowledge into a few synthetic images.
The idea is to synthesize a small number of synthetic data points that, when given to a learning algorithm as training data, result in a model approximating one trained on the original data.
We present a new optimization algorithm that distills a large number of images into a few intermediate feature vectors in the generative model's latent space.
arXiv Detail & Related papers (2023-05-02T17:59:31Z) - Dataset Distillation: A Comprehensive Review [76.26276286545284]
dataset distillation (DD) aims to derive a much smaller dataset containing synthetic samples, based on which the trained models yield performance comparable with those trained on the original dataset.
This paper gives a comprehensive review and summary of recent advances in DD and its application.
arXiv Detail & Related papers (2023-01-17T17:03:28Z) - Data Distillation: A Survey [32.718297871027865]
Deep learning has led to the curation of a vast number of massive and multifarious datasets.
Despite having close-to-human performance on individual tasks, training parameter-hungry models on large datasets poses multi-faceted problems.
Data distillation approaches aim to synthesize terse data summaries, which can serve as effective drop-in replacements of the original dataset.
arXiv Detail & Related papers (2023-01-11T02:25:10Z) - DC-BENCH: Dataset Condensation Benchmark [79.18718490863908]
This work provides the first large-scale standardized benchmark on dataset condensation.
It consists of a suite of evaluations to comprehensively reflect the generability and effectiveness of condensation methods.
The benchmark library is open-sourced to facilitate future research and application.
arXiv Detail & Related papers (2022-07-20T03:54:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.