Adaptive Sampling for Deep Learning via Efficient Nonparametric Proxies
- URL: http://arxiv.org/abs/2311.13583v1
- Date: Wed, 22 Nov 2023 18:40:18 GMT
- Title: Adaptive Sampling for Deep Learning via Efficient Nonparametric Proxies
- Authors: Shabnam Daghaghi, Benjamin Coleman, Benito Geordie, Anshumali
Shrivastava
- Abstract summary: We develop an efficient sketch-based approximation to the Nadaraya-Watson estimator.
Our sampling algorithm outperforms the baseline in terms of wall-clock time and accuracy on four datasets.
- Score: 35.29595714883275
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data sampling is an effective method to improve the training speed of neural
networks, with recent results demonstrating that it can even break the neural
scaling laws. These results critically rely on high-quality scores to estimate
the importance of an input to the network. We observe that there are two
dominant strategies: static sampling, where the scores are determined before
training, and dynamic sampling, where the scores can depend on the model
weights. Static algorithms are computationally inexpensive but less effective
than their dynamic counterparts, which can cause end-to-end slowdown due to
their need to explicitly compute losses. To address this problem, we propose a
novel sampling distribution based on nonparametric kernel regression that
learns an effective importance score as the neural network trains. However,
nonparametric regression models are too computationally expensive to accelerate
end-to-end training. Therefore, we develop an efficient sketch-based
approximation to the Nadaraya-Watson estimator. Using recent techniques from
high-dimensional statistics and randomized algorithms, we prove that our
Nadaraya-Watson sketch approximates the estimator with exponential convergence
guarantees. Our sampling algorithm outperforms the baseline in terms of
wall-clock time and accuracy on four datasets.
Related papers
- Neural Network-Based Score Estimation in Diffusion Models: Optimization
and Generalization [12.812942188697326]
Diffusion models have emerged as a powerful tool rivaling GANs in generating high-quality samples with improved fidelity, flexibility, and robustness.
A key component of these models is to learn the score function through score matching.
Despite empirical success on various tasks, it remains unclear whether gradient-based algorithms can learn the score function with a provable accuracy.
arXiv Detail & Related papers (2024-01-28T08:13:56Z) - Low-rank extended Kalman filtering for online learning of neural
networks from streaming data [71.97861600347959]
We propose an efficient online approximate Bayesian inference algorithm for estimating the parameters of a nonlinear function from a potentially non-stationary data stream.
The method is based on the extended Kalman filter (EKF), but uses a novel low-rank plus diagonal decomposition of the posterior matrix.
In contrast to methods based on variational inference, our method is fully deterministic, and does not require step-size tuning.
arXiv Detail & Related papers (2023-05-31T03:48:49Z) - Confidence-Nets: A Step Towards better Prediction Intervals for
regression Neural Networks on small datasets [0.0]
We propose an ensemble method that attempts to estimate the uncertainty of predictions, increase their accuracy and provide an interval for the expected variation.
The proposed method is tested on various datasets, and a significant improvement in the performance of the neural network model is seen.
arXiv Detail & Related papers (2022-10-31T06:38:40Z) - Deep learning based sferics recognition for AMT data processing in the
dead band [5.683853455697258]
In the audio magnetotellurics (AMT) sounding data processing, the absence of sferic signals in some time ranges typically results in a lack of energy in the AMT dead band.
We propose a deep convolutional neural network (CNN) to automatically recognize sferic signals from redundantly recorded data in a long time range.
Our method can significantly improve S/N and effectively solve the problem of lack of energy in dead band.
arXiv Detail & Related papers (2022-09-22T02:31:28Z) - Scalable computation of prediction intervals for neural networks via
matrix sketching [79.44177623781043]
Existing algorithms for uncertainty estimation require modifying the model architecture and training procedure.
This work proposes a new algorithm that can be applied to a given trained neural network and produces approximate prediction intervals.
arXiv Detail & Related papers (2022-05-06T13:18:31Z) - Convolutional generative adversarial imputation networks for
spatio-temporal missing data in storm surge simulations [86.5302150777089]
Generative Adversarial Imputation Nets (GANs) and GAN-based techniques have attracted attention as unsupervised machine learning methods.
We name our proposed method as Con Conval Generative Adversarial Imputation Nets (Conv-GAIN)
arXiv Detail & Related papers (2021-11-03T03:50:48Z) - Efficient training of physics-informed neural networks via importance
sampling [2.9005223064604078]
Physics-In Neural Networks (PINNs) are a class of deep neural networks that are trained to compute systems governed by partial differential equations (PDEs)
We show that an importance sampling approach will improve the convergence behavior of PINNs training.
arXiv Detail & Related papers (2021-04-26T02:45:10Z) - Convergence rates for gradient descent in the training of
overparameterized artificial neural networks with biases [3.198144010381572]
In recent years, artificial neural networks have developed into a powerful tool for dealing with a multitude of problems for which classical solution approaches.
It is still unclear why randomly gradient descent algorithms reach their limits.
arXiv Detail & Related papers (2021-02-23T18:17:47Z) - Straggler-Resilient Federated Learning: Leveraging the Interplay Between
Statistical Accuracy and System Heterogeneity [57.275753974812666]
Federated learning involves learning from data samples distributed across a network of clients while the data remains local.
In this paper, we propose a novel straggler-resilient federated learning method that incorporates statistical characteristics of the clients' data to adaptively select the clients in order to speed up the learning procedure.
arXiv Detail & Related papers (2020-12-28T19:21:14Z) - Predicting Training Time Without Training [120.92623395389255]
We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function.
We leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model.
We are able to predict the time it takes to fine-tune a model to a given loss without having to perform any training.
arXiv Detail & Related papers (2020-08-28T04:29:54Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.