Does Preprocessing Help Training Over-parameterized Neural Networks?
- URL: http://arxiv.org/abs/2110.04622v1
- Date: Sat, 9 Oct 2021 18:16:23 GMT
- Title: Does Preprocessing Help Training Over-parameterized Neural Networks?
- Authors: Zhao Song, Shuo Yang, Ruizhe Zhang
- Abstract summary: We propose two novel preprocessing ideas to bypass the $Omega(mnd)$ barrier.
Our results provide theoretical insights for a large number of previously established fast training methods.
- Score: 19.64638346701198
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks have achieved impressive performance in many areas.
Designing a fast and provable method for training neural networks is a
fundamental question in machine learning.
The classical training method requires paying $\Omega(mnd)$ cost for both
forward computation and backward computation, where $m$ is the width of the
neural network, and we are given $n$ training points in $d$-dimensional space.
In this paper, we propose two novel preprocessing ideas to bypass this
$\Omega(mnd)$ barrier:
$\bullet$ First, by preprocessing the initial weights of the neural networks,
we can train the neural network in $\widetilde{O}(m^{1-\Theta(1/d)} n d)$ cost
per iteration.
$\bullet$ Second, by preprocessing the input data points, we can train the
neural network in $\widetilde{O} (m^{4/5} nd )$ cost per iteration.
From the technical perspective, our result is a sophisticated combination of
tools in different fields, greedy-type convergence analysis in optimization,
sparsity observation in practical work, high-dimensional geometric search in
data structure, concentration and anti-concentration in probability. Our
results also provide theoretical insights for a large number of previously
established fast training methods.
In addition, our classical algorithm can be generalized to the Quantum
computation model. Interestingly, we can get a similar sublinear cost per
iteration but avoid preprocessing initial weights or input data points.
Related papers
- Bayesian Inference with Deep Weakly Nonlinear Networks [57.95116787699412]
We show at a physics level of rigor that Bayesian inference with a fully connected neural network is solvable.
We provide techniques to compute the model evidence and posterior to arbitrary order in $1/N$ and at arbitrary temperature.
arXiv Detail & Related papers (2024-05-26T17:08:04Z) - Maximal Initial Learning Rates in Deep ReLU Networks [32.157430904535126]
We introduce the maximal initial learning rate $etaast$.
We observe that in constant-width fully-connected ReLU networks, $etaast$ behaves differently from the maximum learning rate later in training.
arXiv Detail & Related papers (2022-12-14T15:58:37Z) - Bypass Exponential Time Preprocessing: Fast Neural Network Training via
Weight-Data Correlation Preprocessing [16.35997749365563]
State-of-art deep neural networks are becoming larger in size every year to deliver increasing model accuracy.
Recent work [Song, Yang and Zhang, NeurIPS 2021] reduces this time per iteration to $o(nmd)$, but requires exponential time to preprocess either the data or the neural network weights.
We present a new preprocessing method that simply stores the weight-data correlation in a tree data structure in order to quickly, dynamically detect which neurons fire at each iteration.
arXiv Detail & Related papers (2022-11-25T16:40:49Z) - A Sublinear Adversarial Training Algorithm [13.42699247306472]
We analyze the convergence guarantee of adversarial training procedure on a two-layer neural network with shifted ReLU activation.
We develop an algorithm for adversarial training with time cost $o(m n d)$ per iteration by applying half-space reporting data structure.
arXiv Detail & Related papers (2022-08-10T15:31:40Z) - Training Overparametrized Neural Networks in Sublinear Time [14.918404733024332]
Deep learning comes at a tremendous computational and energy cost.
We present a new and a subset of binary neural networks, as a small subset of search trees, where each corresponds to a subset of search trees (Ds)
We believe this view would have further applications in analysis analysis of deep networks (Ds)
arXiv Detail & Related papers (2022-08-09T02:29:42Z) - Robust Training and Verification of Implicit Neural Networks: A
Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks.
We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network.
We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z) - Neural Capacitance: A New Perspective of Neural Network Selection via
Edge Dynamics [85.31710759801705]
Current practice requires expensive computational costs in model training for performance prediction.
We propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training.
Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections.
arXiv Detail & Related papers (2022-01-11T20:53:15Z) - Training Multi-Layer Over-Parametrized Neural Network in Subquadratic
Time [12.348083977777833]
We consider the problem of training a multi-layer over-parametrized neural network to minimize the empirical risk induced by a loss function.
In this work, we show how to reduce the training cost per iteration.
arXiv Detail & Related papers (2021-12-14T18:13:36Z) - A quantum algorithm for training wide and deep classical neural networks [72.2614468437919]
We show that conditions amenable to classical trainability via gradient descent coincide with those necessary for efficiently solving quantum linear systems.
We numerically demonstrate that the MNIST image dataset satisfies such conditions.
We provide empirical evidence for $O(log n)$ training of a convolutional neural network with pooling.
arXiv Detail & Related papers (2021-07-19T23:41:03Z) - Beyond Lazy Training for Over-parameterized Tensor Decomposition [69.4699995828506]
We show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data.
Our results show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data.
arXiv Detail & Related papers (2020-10-22T00:32:12Z) - Towards Understanding Hierarchical Learning: Benefits of Neural
Representations [160.33479656108926]
In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks.
We show that neural representation can achieve improved sample complexities compared with the raw input.
Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.
arXiv Detail & Related papers (2020-06-24T02:44:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.