Dataset Distillation Meets Provable Subset Selection
- URL: http://arxiv.org/abs/2307.08086v1
- Date: Sun, 16 Jul 2023 15:58:19 GMT
- Title: Dataset Distillation Meets Provable Subset Selection
- Authors: Murad Tukan, Alaa Maalouf, Margarita Osadchy
- Abstract summary: dataset distillation is proposed to compress a large training dataset into a smaller synthetic one that retains its performance.
We present a provable, sampling-based approach for initializing the distilled set by identifying important and removing redundant points in the data.
We further merge the idea of data subset selection with dataset distillation, by training the distilled set on '' sampled points during the training procedure instead of randomly sampling the next batch.
- Score: 14.158845925610438
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning has grown tremendously over recent years, yielding
state-of-the-art results in various fields. However, training such models
requires huge amounts of data, increasing the computational time and cost. To
address this, dataset distillation was proposed to compress a large training
dataset into a smaller synthetic one that retains its performance -- this is
usually done by (1) uniformly initializing a synthetic set and (2) iteratively
updating/learning this set according to a predefined loss by uniformly sampling
instances from the full data. In this paper, we improve both phases of dataset
distillation: (1) we present a provable, sampling-based approach for
initializing the distilled set by identifying important and removing redundant
points in the data, and (2) we further merge the idea of data subset selection
with dataset distillation, by training the distilled set on ``important''
sampled points during the training procedure instead of randomly sampling the
next batch. To do so, we define the notion of importance based on the relative
contribution of instances with respect to two different loss functions, i.e.,
one for the initialization phase (a kernel fitting function for kernel ridge
regression and $K$-means based loss function for any other distillation
method), and the relative cross-entropy loss (or any other predefined loss)
function for the training phase. Finally, we provide experimental results
showing how our method can latch on to existing dataset distillation techniques
and improve their performance.
Related papers
- Benchmarking of a new data splitting method on volcanic eruption data [38.85972012552084]
An iterative procedure divides the input dataset of volcanic eruption into two parts using a dissimilarity index calculated on the cumulative histograms of these two parts.
The proposed model achieves the best performance, with a slightly higher number of epochs.
Each model was trained with early stopping, suitable in case of overfitting, and the higher number of epochs in the proposed method demonstrates that early stopping did not detect overfitting.
arXiv Detail & Related papers (2024-10-08T19:29:46Z) - Exploring the potential of prototype-based soft-labels data distillation for imbalanced data classification [0.0]
Main goal is to push further the performance of prototype-based soft-labels distillation in terms of classification accuracy.
Experimental studies trace the capability of the method to distill the data, but also the opportunity to act as an augmentation method.
arXiv Detail & Related papers (2024-03-25T19:15:19Z) - Embarassingly Simple Dataset Distillation [0.0]
We tackle dataset distillation at its core by treating it directly as a bilevel optimization problem.
A deeper dive into the nature of distilled data unveils pronounced intercorrelation.
We devise a boosting mechanism that generates distilled datasets that contain subsets with near optimal performance across different data budgets.
arXiv Detail & Related papers (2023-11-13T02:14:54Z) - Data Distillation Can Be Like Vodka: Distilling More Times For Better
Quality [78.6359306550245]
We argue that using just one synthetic subset for distillation will not yield optimal generalization performance.
PDD synthesizes multiple small sets of synthetic images, each conditioned on the previous sets, and trains the model on the cumulative union of these subsets.
Our experiments show that PDD can effectively improve the performance of existing dataset distillation methods by up to 4.3%.
arXiv Detail & Related papers (2023-10-10T20:04:44Z) - Self-Supervised Dataset Distillation for Transfer Learning [77.4714995131992]
We propose a novel problem of distilling an unlabeled dataset into a set of small synthetic samples for efficient self-supervised learning (SSL)
We first prove that a gradient of synthetic samples with respect to a SSL objective in naive bilevel optimization is textitbiased due to randomness originating from data augmentations or masking.
We empirically validate the effectiveness of our method on various applications involving transfer learning.
arXiv Detail & Related papers (2023-10-10T10:48:52Z) - Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation [96.92250565207017]
We study the data efficiency and selection for the dataset distillation task.
By re-formulating the dynamics of distillation, we provide insight into the inherent redundancy in the real dataset.
We find the most contributing samples based on their causal effects on the distillation.
arXiv Detail & Related papers (2023-05-28T06:53:41Z) - On the Size and Approximation Error of Distilled Sets [57.61696480305911]
We take a theoretical view on kernel ridge regression based methods of dataset distillation such as Kernel Inducing Points.
We prove that a small set of instances exists in the original input space such that its solution in the RFF space coincides with the solution of the original data.
A KRR solution can be generated using this distilled set of instances which gives an approximation towards the KRR solution optimized on the full input data.
arXiv Detail & Related papers (2023-05-23T14:37:43Z) - Minimizing the Accumulated Trajectory Error to Improve Dataset
Distillation [151.70234052015948]
We propose a novel approach that encourages the optimization algorithm to seek a flat trajectory.
We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory.
Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7%.
arXiv Detail & Related papers (2022-11-20T15:49:11Z) - Dataset Distillation via Factorization [58.8114016318593]
We introduce a emphdataset factorization approach, termed emphHaBa, which is a plug-and-play strategy portable to any existing dataset distillation (DD) baseline.
emphHaBa explores decomposing a dataset into two components: data emphHallucination networks and emphBases.
Our method can yield significant improvement on downstream classification tasks compared with previous state of the arts, while reducing the total number of compressed parameters by up to 65%.
arXiv Detail & Related papers (2022-10-30T08:36:19Z) - Dataset Distillation using Neural Feature Regression [32.53291298089172]
We develop an algorithm for dataset distillation using neural Feature Regression with Pooling (FRePo)
FRePo achieves state-of-the-art performance with an order of magnitude less memory requirement and two orders of magnitude faster training than previous methods.
We show that high-quality distilled data can greatly improve various downstream applications, such as continual learning and membership inference defense.
arXiv Detail & Related papers (2022-06-01T19:02:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.