Efficient Dataset Distillation Using Random Feature Approximation
- URL: http://arxiv.org/abs/2210.12067v1
- Date: Fri, 21 Oct 2022 15:56:13 GMT
- Title: Efficient Dataset Distillation Using Random Feature Approximation
- Authors: Noel Loo, Ramin Hasani, Alexander Amini, Daniela Rus
- Abstract summary: We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
- Score: 109.07737733329019
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Dataset distillation compresses large datasets into smaller synthetic
coresets which retain performance with the aim of reducing the storage and
computational burden of processing the entire dataset. Today's best-performing
algorithm, \textit{Kernel Inducing Points} (KIP), which makes use of the
correspondence between infinite-width neural networks and kernel-ridge
regression, is prohibitively slow due to the exact computation of the neural
tangent kernel matrix, scaling $O(|S|^2)$, with $|S|$ being the coreset size.
To improve this, we propose a novel algorithm that uses a random feature
approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel, which
reduces the kernel matrix computation to $O(|S|)$. Our algorithm provides at
least a 100-fold speedup over KIP and can run on a single GPU. Our new method,
termed an RFA Distillation (RFAD), performs competitively with KIP and other
dataset condensation algorithms in accuracy over a range of large-scale
datasets, both in kernel regression and finite-width network training. We
demonstrate the effectiveness of our approach on tasks involving model
interpretability and privacy preservation.
Related papers
- On Model Compression for Neural Networks: Framework, Algorithm, and Convergence Guarantee [21.818773423324235]
This paper focuses on two model compression techniques: low-rank approximation and weight approximation.
In this paper, a holistic framework is proposed for model compression from a novel perspective of non optimization.
arXiv Detail & Related papers (2023-03-13T02:14:42Z) - Kernel Regression with Infinite-Width Neural Networks on Millions of
Examples [27.408712993696213]
We study scaling laws of several neural kernels across many orders of magnitude for the CIFAR-5m dataset.
We obtain a test accuracy of 91.2% (SotA for a pure kernel method)
arXiv Detail & Related papers (2023-03-09T17:11:31Z) - Provable Data Subset Selection For Efficient Neural Network Training [73.34254513162898]
We introduce the first algorithm to construct coresets for emphRBFNNs, i.e., small weighted subsets that approximate the loss of the input data on any radial basis function network.
We then perform empirical evaluations on function approximation and dataset subset selection on popular network architectures and data sets.
arXiv Detail & Related papers (2023-03-09T10:08:34Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - Scaling Neural Tangent Kernels via Sketching and Random Features [53.57615759435126]
Recent works report that NTK regression can outperform finitely-wide neural networks trained on small-scale datasets.
We design a near input-sparsity time approximation algorithm for NTK, by sketching the expansions of arc-cosine kernels.
We show that a linear regressor trained on our CNTK features matches the accuracy of exact CNTK on CIFAR-10 dataset while achieving 150x speedup.
arXiv Detail & Related papers (2021-06-15T04:44:52Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - A New Algorithm for Tessellated Kernel Learning [4.264192013842097]
An ideal set of kernels should: admit a linear parameterization (for tractability); be dense in the set of all kernels (for robustness); be universal (for accuracy)
The recently proposed Tesselated Kernels (TKs) is currently the only known class which meets all three criteria.
By contrast, the 2-step algorithm proposed here scales to 10,000 data points and extends to the regression problem.
arXiv Detail & Related papers (2020-06-13T18:33:31Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.