On the Size and Approximation Error of Distilled Sets
- URL: http://arxiv.org/abs/2305.14113v1
- Date: Tue, 23 May 2023 14:37:43 GMT
- Title: On the Size and Approximation Error of Distilled Sets
- Authors: Alaa Maalouf and Murad Tukan and Noel Loo and Ramin Hasani and Mathias
Lechner and Daniela Rus
- Abstract summary: We take a theoretical view on kernel ridge regression based methods of dataset distillation such as Kernel Inducing Points.
We prove that a small set of instances exists in the original input space such that its solution in the RFF space coincides with the solution of the original data.
A KRR solution can be generated using this distilled set of instances which gives an approximation towards the KRR solution optimized on the full input data.
- Score: 57.61696480305911
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dataset Distillation is the task of synthesizing small datasets from large
ones while still retaining comparable predictive accuracy to the original
uncompressed dataset. Despite significant empirical progress in recent years,
there is little understanding of the theoretical limitations/guarantees of
dataset distillation, specifically, what excess risk is achieved by
distillation compared to the original dataset, and how large are distilled
datasets? In this work, we take a theoretical view on kernel ridge regression
(KRR) based methods of dataset distillation such as Kernel Inducing Points. By
transforming ridge regression in random Fourier features (RFF) space, we
provide the first proof of the existence of small (size) distilled datasets and
their corresponding excess risk for shift-invariant kernels. We prove that a
small set of instances exists in the original input space such that its
solution in the RFF space coincides with the solution of the original data. We
further show that a KRR solution can be generated using this distilled set of
instances which gives an approximation towards the KRR solution optimized on
the full input data. The size of this set is linear in the dimension of the RFF
space of the input set or alternatively near linear in the number of effective
degrees of freedom, which is a function of the kernel, number of datapoints,
and the regularization parameter $\lambda$. The error bound of this distilled
set is also a function of $\lambda$. We verify our bounds analytically and
empirically.
Related papers
- Highly Adaptive Ridge [84.38107748875144]
We propose a regression method that achieves a $n-2/3$ dimension-free L2 convergence rate in the class of right-continuous functions with square-integrable sectional derivatives.
Har is exactly kernel ridge regression with a specific data-adaptive kernel based on a saturated zero-order tensor-product spline basis expansion.
We demonstrate empirical performance better than state-of-the-art algorithms for small datasets in particular.
arXiv Detail & Related papers (2024-10-03T17:06:06Z) - Embarassingly Simple Dataset Distillation [0.0]
We tackle dataset distillation at its core by treating it directly as a bilevel optimization problem.
A deeper dive into the nature of distilled data unveils pronounced intercorrelation.
We devise a boosting mechanism that generates distilled datasets that contain subsets with near optimal performance across different data budgets.
arXiv Detail & Related papers (2023-11-13T02:14:54Z) - Self-Supervised Dataset Distillation for Transfer Learning [77.4714995131992]
We propose a novel problem of distilling an unlabeled dataset into a set of small synthetic samples for efficient self-supervised learning (SSL)
We first prove that a gradient of synthetic samples with respect to a SSL objective in naive bilevel optimization is textitbiased due to randomness originating from data augmentations or masking.
We empirically validate the effectiveness of our method on various applications involving transfer learning.
arXiv Detail & Related papers (2023-10-10T10:48:52Z) - Dataset Distillation Meets Provable Subset Selection [14.158845925610438]
dataset distillation is proposed to compress a large training dataset into a smaller synthetic one that retains its performance.
We present a provable, sampling-based approach for initializing the distilled set by identifying important and removing redundant points in the data.
We further merge the idea of data subset selection with dataset distillation, by training the distilled set on '' sampled points during the training procedure instead of randomly sampling the next batch.
arXiv Detail & Related papers (2023-07-16T15:58:19Z) - Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation [96.92250565207017]
We study the data efficiency and selection for the dataset distillation task.
By re-formulating the dynamics of distillation, we provide insight into the inherent redundancy in the real dataset.
We find the most contributing samples based on their causal effects on the distillation.
arXiv Detail & Related papers (2023-05-28T06:53:41Z) - Dataset Distillation using Neural Feature Regression [32.53291298089172]
We develop an algorithm for dataset distillation using neural Feature Regression with Pooling (FRePo)
FRePo achieves state-of-the-art performance with an order of magnitude less memory requirement and two orders of magnitude faster training than previous methods.
We show that high-quality distilled data can greatly improve various downstream applications, such as continual learning and membership inference defense.
arXiv Detail & Related papers (2022-06-01T19:02:06Z) - SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm [60.61943386819384]
Existing implementations of KRR require that all the data is stored in the main memory.
We propose StreaMRAK - a streaming version of KRR.
We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum.
arXiv Detail & Related papers (2021-08-23T21:03:09Z) - Improved guarantees and a multiple-descent curve for Column Subset
Selection and the Nystr\"om method [76.73096213472897]
We develop techniques which exploit spectral properties of the data matrix to obtain improved approximation guarantees.
Our approach leads to significantly better bounds for datasets with known rates of singular value decay.
We show that both our improved bounds and the multiple-descent curve can be observed on real datasets simply by varying the RBF parameter.
arXiv Detail & Related papers (2020-02-21T00:43:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.