Related papers: Embarassingly Simple Dataset Distillation

Embarassingly Simple Dataset Distillation

URL: http://arxiv.org/abs/2311.07025v1
Date: Mon, 13 Nov 2023 02:14:54 GMT
Title: Embarassingly Simple Dataset Distillation
Authors: Feng Yunzhen, Vedantam Ramakrishna, Kempe Julia
Abstract summary: We tackle dataset distillation at its core by treating it directly as a bilevel optimization problem. A deeper dive into the nature of distilled data unveils pronounced intercorrelation. We devise a boosting mechanism that generates distilled datasets that contain subsets with near optimal performance across different data budgets.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Dataset distillation extracts a small set of synthetic training samples from a large dataset with the goal of achieving competitive performance on test data when trained on this sample. In this work, we tackle dataset distillation at its core by treating it directly as a bilevel optimization problem. Re-examining the foundational back-propagation through time method, we study the pronounced variance in the gradients, computational burden, and long-term dependencies. We introduce an improved method: Random Truncated Backpropagation Through Time (RaT-BPTT) to address them. RaT-BPTT incorporates a truncation coupled with a random window, effectively stabilizing the gradients and speeding up the optimization while covering long dependencies. This allows us to establish new state-of-the-art for a variety of standard dataset benchmarks. A deeper dive into the nature of distilled data unveils pronounced intercorrelation. In particular, subsets of distilled datasets tend to exhibit much worse performance than directly distilled smaller datasets of the same size. Leveraging RaT-BPTT, we devise a boosting mechanism that generates distilled datasets that contain subsets with near optimal performance across different data budgets.

Related papers

DD-Ranking: Rethinking the Evaluation of Dataset Distillation [223.28392857127733]
We propose DD-Ranking, a unified evaluation framework, along with new general evaluation metrics to uncover the true performance improvements achieved by different methods.<n>By refocusing on the actual information enhancement of distilled datasets, DD-Ranking provides a more comprehensive and fair evaluation standard for future research advancements.
arXiv Detail & Related papers (2025-05-19T16:19:50Z)
Swift Cross-Dataset Pruning: Enhancing Fine-Tuning Efficiency in Natural Language Understanding [2.379669478864599]
Current cross-dataset pruning techniques for fine-tuning often rely on computationally expensive sample ranking processes. We propose Swift Cross-Dataset Pruning (SCDP), which uses TF-IDF embeddings with geometric median to rapidly evaluate sample importance. Experimental results on six diverse datasets demonstrate the effectiveness of our method, spanning various tasks and scales.
arXiv Detail & Related papers (2025-01-05T03:52:04Z)
Distilling Long-tailed Datasets [13.330572317331198]
We propose a novel long-tailed dataset distillation method, Long-tailed dataset Aware distillation (LAD) LAD reduces the distance between the student and the biased expert trajectories and prevents the tail class bias from being distilled to the synthetic dataset. This work pioneers the field of long-tailed dataset distillation (LTDD), marking the first effective effort to distill long-tailed datasets.
arXiv Detail & Related papers (2024-08-24T15:36:36Z)
Hierarchical Features Matter: A Deep Exploration of GAN Priors for Improved Dataset Distillation [51.44054828384487]
We propose a novel parameterization method dubbed Hierarchical Generative Latent Distillation (H-GLaD) This method systematically explores hierarchical layers within the generative adversarial networks (GANs) In addition, we introduce a novel class-relevant feature distance metric to alleviate the computational burden associated with synthetic dataset evaluation.
arXiv Detail & Related papers (2024-06-09T09:15:54Z)
Towards Adversarially Robust Dataset Distillation by Curvature Regularization [11.02948004359488]
dataset distillation (DD) allows datasets to be distilled to fractions of their original size while preserving the rich distributional information. Recent research in this area has been focusing on improving the accuracy of models trained on distilled datasets. We propose a new method that achieves this goal by incorporating curvature regularization into the distillation process with much less computational overhead than standard adversarial training.
arXiv Detail & Related papers (2024-03-15T06:31:03Z)
Importance-Aware Adaptive Dataset Distillation [53.79746115426363]
Development of deep learning models is enabled by the availability of large-scale datasets. dataset distillation aims to synthesize a compact dataset that retains the essential information from the large original dataset. We propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance.
arXiv Detail & Related papers (2024-01-29T03:29:39Z)
Dataset Distillation via Adversarial Prediction Matching [24.487950991247764]
We propose an adversarial framework to solve the dataset distillation problem efficiently. Our method can produce synthetic datasets just 10% the size of the original, yet achieve, on average, 94% of the test accuracy of models trained on the full original datasets.
arXiv Detail & Related papers (2023-12-14T13:19:33Z)
Sequential Subset Matching for Dataset Distillation [44.322842898670565]
We propose a new dataset distillation strategy called Sequential Subset Matching (SeqMatch) Our analysis indicates that SeqMatch effectively addresses the coupling issue by sequentially generating the synthetic instances. Our code is available at https://github.com/shqii1j/seqmatch.
arXiv Detail & Related papers (2023-11-02T19:49:11Z)
Dataset Distillation Meets Provable Subset Selection [14.158845925610438]
dataset distillation is proposed to compress a large training dataset into a smaller synthetic one that retains its performance. We present a provable, sampling-based approach for initializing the distilled set by identifying important and removing redundant points in the data. We further merge the idea of data subset selection with dataset distillation, by training the distilled set on '' sampled points during the training procedure instead of randomly sampling the next batch.
arXiv Detail & Related papers (2023-07-16T15:58:19Z)
Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation [96.92250565207017]
We study the data efficiency and selection for the dataset distillation task. By re-formulating the dynamics of distillation, we provide insight into the inherent redundancy in the real dataset. We find the most contributing samples based on their causal effects on the distillation.
arXiv Detail & Related papers (2023-05-28T06:53:41Z)
On the Size and Approximation Error of Distilled Sets [57.61696480305911]
We take a theoretical view on kernel ridge regression based methods of dataset distillation such as Kernel Inducing Points. We prove that a small set of instances exists in the original input space such that its solution in the RFF space coincides with the solution of the original data. A KRR solution can be generated using this distilled set of instances which gives an approximation towards the KRR solution optimized on the full input data.
arXiv Detail & Related papers (2023-05-23T14:37:43Z)
Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation [151.70234052015948]
We propose a novel approach that encourages the optimization algorithm to seek a flat trajectory. We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory. Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7%.
arXiv Detail & Related papers (2022-11-20T15:49:11Z)
Dataset Distillation using Neural Feature Regression [32.53291298089172]
We develop an algorithm for dataset distillation using neural Feature Regression with Pooling (FRePo) FRePo achieves state-of-the-art performance with an order of magnitude less memory requirement and two orders of magnitude faster training than previous methods. We show that high-quality distilled data can greatly improve various downstream applications, such as continual learning and membership inference defense.
arXiv Detail & Related papers (2022-06-01T19:02:06Z)
Dataset Distillation by Matching Training Trajectories [75.9031209877651]
We propose a new formulation that optimize our distilled data to guide networks to a similar state as those trained on real data. Given a network, we train it for several iterations on our distilled data and optimize the distilled data with respect to the distance between the synthetically trained parameters and the parameters trained on real data. Our method handily outperforms existing methods and also allows us to distill higher-resolution visual data.
arXiv Detail & Related papers (2022-03-22T17:58:59Z)
LiDAR dataset distillation within bayesian active learning framework: Understanding the effect of data augmentation [63.20765930558542]
Active learning (AL) has re-gained attention recently to address reduction of annotation costs and dataset size. This paper performs a principled evaluation of AL based dataset distillation on (1/4th) of the large Semantic-KITTI dataset. We observe that data augmentation achieves full dataset accuracy using only 60% of samples from the selected dataset configuration.
arXiv Detail & Related papers (2022-02-06T00:04:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.