Related papers: Does Preprocessing Help Training Over-parameterized Neural Networks?

Does Preprocessing Help Training Over-parameterized Neural Networks?

URL: http://arxiv.org/abs/2110.04622v1
Date: Sat, 9 Oct 2021 18:16:23 GMT
Title: Does Preprocessing Help Training Over-parameterized Neural Networks?
Authors: Zhao Song, Shuo Yang, Ruizhe Zhang
Abstract summary: We propose two novel preprocessing ideas to bypass the $Omega(mnd)$ barrier. Our results provide theoretical insights for a large number of previously established fast training methods.
Score: 19.64638346701198
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks have achieved impressive performance in many areas. Designing a fast and provable method for training neural networks is a fundamental question in machine learning. The classical training method requires paying $\Omega(mnd)$ cost for both forward computation and backward computation, where $m$ is the width of the neural network, and we are given $n$ training points in $d$-dimensional space. In this paper, we propose two novel preprocessing ideas to bypass this $\Omega(mnd)$ barrier: $\bullet$ First, by preprocessing the initial weights of the neural networks, we can train the neural network in $\widetilde{O}(m^{1-\Theta(1/d)} n d)$ cost per iteration. $\bullet$ Second, by preprocessing the input data points, we can train the neural network in $\widetilde{O} (m^{4/5} nd )$ cost per iteration. From the technical perspective, our result is a sophisticated combination of tools in different fields, greedy-type convergence analysis in optimization, sparsity observation in practical work, high-dimensional geometric search in data structure, concentration and anti-concentration in probability. Our results also provide theoretical insights for a large number of previously established fast training methods. In addition, our classical algorithm can be generalized to the Quantum computation model. Interestingly, we can get a similar sublinear cost per iteration but avoid preprocessing initial weights or input data points.

Related papers

Bayesian Inference with Deep Weakly Nonlinear Networks [57.95116787699412]
We show at a physics level of rigor that Bayesian inference with a fully connected neural network is solvable. We provide techniques to compute the model evidence and posterior to arbitrary order in $1/N$ and at arbitrary temperature.
arXiv Detail & Related papers (2024-05-26T17:08:04Z)
Maximal Initial Learning Rates in Deep ReLU Networks [32.157430904535126]
We introduce the maximal initial learning rate $etaast$. We observe that in constant-width fully-connected ReLU networks, $etaast$ behaves differently from the maximum learning rate later in training.
arXiv Detail & Related papers (2022-12-14T15:58:37Z)
Bypass Exponential Time Preprocessing: Fast Neural Network Training via Weight-Data Correlation Preprocessing [16.35997749365563]
State-of-art deep neural networks are becoming larger in size every year to deliver increasing model accuracy. Recent work [Song, Yang and Zhang, NeurIPS 2021] reduces this time per iteration to $o(nmd)$, but requires exponential time to preprocess either the data or the neural network weights. We present a new preprocessing method that simply stores the weight-data correlation in a tree data structure in order to quickly, dynamically detect which neurons fire at each iteration.
arXiv Detail & Related papers (2022-11-25T16:40:49Z)
A Sublinear Adversarial Training Algorithm [13.42699247306472]
We analyze the convergence guarantee of adversarial training procedure on a two-layer neural network with shifted ReLU activation. We develop an algorithm for adversarial training with time cost $o(m n d)$ per iteration by applying half-space reporting data structure.
arXiv Detail & Related papers (2022-08-10T15:31:40Z)
Training Overparametrized Neural Networks in Sublinear Time [14.918404733024332]
Deep learning comes at a tremendous computational and energy cost. We present a new and a subset of binary neural networks, as a small subset of search trees, where each corresponds to a subset of search trees (Ds) We believe this view would have further applications in analysis analysis of deep networks (Ds)
arXiv Detail & Related papers (2022-08-09T02:29:42Z)
Robust Training and Verification of Implicit Neural Networks: A Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks. We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network. We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z)
Neural Capacitance: A New Perspective of Neural Network Selection via Edge Dynamics [85.31710759801705]
Current practice requires expensive computational costs in model training for performance prediction. We propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training. Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections.
arXiv Detail & Related papers (2022-01-11T20:53:15Z)
Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time [12.348083977777833]
We consider the problem of training a multi-layer over-parametrized neural network to minimize the empirical risk induced by a loss function. In this work, we show how to reduce the training cost per iteration.
arXiv Detail & Related papers (2021-12-14T18:13:36Z)
A quantum algorithm for training wide and deep classical neural networks [72.2614468437919]
We show that conditions amenable to classical trainability via gradient descent coincide with those necessary for efficiently solving quantum linear systems. We numerically demonstrate that the MNIST image dataset satisfies such conditions. We provide empirical evidence for $O(log n)$ training of a convolutional neural network with pooling.
arXiv Detail & Related papers (2021-07-19T23:41:03Z)
Beyond Lazy Training for Over-parameterized Tensor Decomposition [69.4699995828506]
We show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data. Our results show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data.
arXiv Detail & Related papers (2020-10-22T00:32:12Z)
Towards Understanding Hierarchical Learning: Benefits of Neural Representations [160.33479656108926]
In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks. We show that neural representation can achieve improved sample complexities compared with the raw input. Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.
arXiv Detail & Related papers (2020-06-24T02:44:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.