Related papers: Predicting Training Time Without Training

Predicting Training Time Without Training

URL: http://arxiv.org/abs/2008.12478v1
Date: Fri, 28 Aug 2020 04:29:54 GMT
Title: Predicting Training Time Without Training
Authors: Luca Zancato, Alessandro Achille, Avinash Ravichandran, Rahul Bhotika, Stefano Soatto
Abstract summary: We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function. We leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model. We are able to predict the time it takes to fine-tune a model to a given loss without having to perform any training.
Score: 120.92623395389255
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function. To do so, we leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model. This allows us to approximate the training loss and accuracy at any point during training by solving a low-dimensional Stochastic Differential Equation (SDE) in function space. Using this result, we are able to predict the time it takes for Stochastic Gradient Descent (SGD) to fine-tune a model to a given loss without having to perform any training. In our experiments, we are able to predict training time of a ResNet within a 20% error margin on a variety of datasets and hyper-parameters, at a 30 to 45-fold reduction in cost compared to actual training. We also discuss how to further reduce the computational and memory cost of our method, and in particular we show that by exploiting the spectral properties of the gradients' matrix it is possible predict training time on a large dataset while processing only a subset of the samples.

Related papers

Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters. In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z)
Online Importance Sampling for Stochastic Gradient Optimization [33.42221341526944]
We propose a practical algorithm that efficiently computes data importance on-the-fly during training. We also introduce a novel metric based on the derivative of the loss w.r.t. the network output, designed for mini-batch importance sampling.
arXiv Detail & Related papers (2023-11-24T13:21:35Z)
Adaptive Sampling for Deep Learning via Efficient Nonparametric Proxies [35.29595714883275]
We develop an efficient sketch-based approximation to the Nadaraya-Watson estimator. Our sampling algorithm outperforms the baseline in terms of wall-clock time and accuracy on four datasets.
arXiv Detail & Related papers (2023-11-22T18:40:18Z)
KAKURENBO: Adaptively Hiding Samples in Deep Neural Network Training [2.8804804517897935]
We propose a method for hiding the least-important samples during the training of deep neural networks. We adaptively find samples to exclude in a given epoch based on their contribution to the overall learning process. Our method can reduce total training time by up to 22% impacting accuracy only by 0.4% compared to the baseline.
arXiv Detail & Related papers (2023-10-16T06:19:29Z)
On minimizing the training set fill distance in machine learning regression [0.552480439325792]
We study a data selection approach that aims to minimize the fill distance of the selected set. We show that selecting training sets with the FPS can also increase model stability for the specific case of Gaussian kernel regression approaches.
arXiv Detail & Related papers (2023-07-20T16:18:33Z)
Towards Memory- and Time-Efficient Backpropagation for Training Spiking Neural Networks [70.75043144299168]
Spiking Neural Networks (SNNs) are promising energy-efficient models for neuromorphic computing. We propose the Spatial Learning Through Time (SLTT) method that can achieve high performance while greatly improving training efficiency. Our method achieves state-of-the-art accuracy on ImageNet, while the memory cost and training time are reduced by more than 70% and 50%, respectively, compared with BPTT.
arXiv Detail & Related papers (2023-02-28T05:01:01Z)
Reconstructing Training Data from Model Gradient, Provably [68.21082086264555]
We reconstruct the training samples from a single gradient query at a randomly chosen parameter value. As a provable attack that reveals sensitive training data, our findings suggest potential severe threats to privacy.
arXiv Detail & Related papers (2022-12-07T15:32:22Z)
Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep Neural Network, a Survey [69.3939291118954]
State-of-the-art deep learning models have a parameter count that reaches into the billions. Training, storing and transferring such models is energy and time consuming, thus costly. Model compression lowers storage and transfer costs, and can further make training more efficient by decreasing the number of computations in the forward and/or backward pass. This work is a survey on methods which reduce the number of trained weights in deep learning models throughout the training.
arXiv Detail & Related papers (2022-05-17T05:37:08Z)
Enabling On-Device CNN Training by Self-Supervised Instance Filtering and Error Map Pruning [17.272561332310303]
This work aims to enable on-device training of convolutional neural networks (CNNs) by reducing the computation cost at training time. CNN models are usually trained on high-performance computers and only the trained models are deployed to edge devices.
arXiv Detail & Related papers (2020-07-07T05:52:37Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.