Provable Data Subset Selection For Efficient Neural Network Training
- URL: http://arxiv.org/abs/2303.05151v1
- Date: Thu, 9 Mar 2023 10:08:34 GMT
- Title: Provable Data Subset Selection For Efficient Neural Network Training
- Authors: Murad Tukan, Samson Zhou, Alaa Maalouf, Daniela Rus, Vladimir
Braverman, Dan Feldman
- Abstract summary: We introduce the first algorithm to construct coresets for emphRBFNNs, i.e., small weighted subsets that approximate the loss of the input data on any radial basis function network.
We then perform empirical evaluations on function approximation and dataset subset selection on popular network architectures and data sets.
- Score: 73.34254513162898
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Radial basis function neural networks (\emph{RBFNN}) are {well-known} for
their capability to approximate any continuous function on a closed bounded set
with arbitrary precision given enough hidden neurons. In this paper, we
introduce the first algorithm to construct coresets for \emph{RBFNNs}, i.e.,
small weighted subsets that approximate the loss of the input data on any
radial basis function network and thus approximate any function defined by an
\emph{RBFNN} on the larger input data. In particular, we construct coresets for
radial basis and Laplacian loss functions. We then use our coresets to obtain a
provable data subset selection algorithm for training deep neural networks.
Since our coresets approximate every function, they also approximate the
gradient of each weight in a neural network, which is a particular function on
the input. We then perform empirical evaluations on function approximation and
dataset subset selection on popular network architectures and data sets,
demonstrating the efficacy and accuracy of our coreset construction.
Related papers
- Do deep neural networks have an inbuilt Occam's razor? [1.1470070927586016]
We show that structured data combined with an intrinsic Occam's razor-like inductive bias towards simple functions counteracts the exponential growth of functions with complexity.
This analysis reveals that structured data, combined with an intrinsic Occam's razor-like inductive bias towards (Kolmogorov) simple functions that is strong enough to counteract the exponential growth of functions with complexity, is a key to the success of DNNs.
arXiv Detail & Related papers (2023-04-13T16:58:21Z) - Universal Approximation Property of Fully Convolutional Neural Networks
with Zero Padding [10.295288663157393]
CNNs function as tensor-to-tensor mappings, preserving the spatial structure of input data.
We show that CNNs can approximate arbitrary continuous functions in cases where both the input and output values exhibit the same spatial shape.
We also verify that deep, narrow CNNs possess the UAP as tensor-to-tensor functions.
arXiv Detail & Related papers (2022-11-18T02:04:16Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - What Can Be Learnt With Wide Convolutional Neural Networks? [69.55323565255631]
We study infinitely-wide deep CNNs in the kernel regime.
We prove that deep CNNs adapt to the spatial scale of the target function.
We conclude by computing the generalisation error of a deep CNN trained on the output of another deep CNN.
arXiv Detail & Related papers (2022-08-01T17:19:32Z) - NeuralEF: Deconstructing Kernels by Deep Neural Networks [47.54733625351363]
Traditional nonparametric solutions based on the Nystr"om formula suffer from scalability issues.
Recent work has resorted to a parametric approach, i.e., training neural networks to approximate the eigenfunctions.
We show that these problems can be fixed by using a new series of objective functions that generalizes to space of supervised and unsupervised learning problems.
arXiv Detail & Related papers (2022-04-30T05:31:07Z) - Scaling Neural Tangent Kernels via Sketching and Random Features [53.57615759435126]
Recent works report that NTK regression can outperform finitely-wide neural networks trained on small-scale datasets.
We design a near input-sparsity time approximation algorithm for NTK, by sketching the expansions of arc-cosine kernels.
We show that a linear regressor trained on our CNTK features matches the accuracy of exact CNTK on CIFAR-10 dataset while achieving 150x speedup.
arXiv Detail & Related papers (2021-06-15T04:44:52Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Estimating Multiplicative Relations in Neural Networks [0.0]
We will use properties of logarithmic functions to propose a pair of activation functions which can translate products into linear expression and learn using backpropagation.
We will try to generalize this approach for some complex arithmetic functions and test the accuracy on a disjoint distribution with the training set.
arXiv Detail & Related papers (2020-10-28T14:28:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.