Neural Capacitance: A New Perspective of Neural Network Selection via
Edge Dynamics
- URL: http://arxiv.org/abs/2201.04194v1
- Date: Tue, 11 Jan 2022 20:53:15 GMT
- Title: Neural Capacitance: A New Perspective of Neural Network Selection via
Edge Dynamics
- Authors: Chunheng Jiang, Tejaswini Pedapati, Pin-Yu Chen, Yizhou Sun, Jianxi
Gao
- Abstract summary: Current practice requires expensive computational costs in model training for performance prediction.
We propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training.
Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections.
- Score: 85.31710759801705
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Efficient model selection for identifying a suitable pre-trained neural
network to a downstream task is a fundamental yet challenging task in deep
learning. Current practice requires expensive computational costs in model
training for performance prediction. In this paper, we propose a novel
framework for neural network selection by analyzing the governing dynamics over
synaptic connections (edges) during training. Our framework is built on the
fact that back-propagation during neural network training is equivalent to the
dynamical evolution of synaptic connections. Therefore, a converged neural
network is associated with an equilibrium state of a networked system composed
of those edges. To this end, we construct a network mapping $\phi$, converting
a neural network $G_A$ to a directed line graph $G_B$ that is defined on those
edges in $G_A$. Next, we derive a neural capacitance metric $\beta_{\rm eff}$
as a predictive measure universally capturing the generalization capability of
$G_A$ on the downstream task using only a handful of early training results. We
carried out extensive experiments using 17 popular pre-trained ImageNet models
and five benchmark datasets, including CIFAR10, CIFAR100, SVHN, Fashion MNIST
and Birds, to evaluate the fine-tuning performance of our framework. Our neural
capacitance metric is shown to be a powerful indicator for model selection
based only on early training results and is more efficient than
state-of-the-art methods.
Related papers
- NEAR: A Training-Free Pre-Estimator of Machine Learning Model Performance [0.0]
We propose a zero-cost proxy Network Expressivity by Activation Rank (NEAR) to identify the optimal neural network without training.
We demonstrate the cutting-edge correlation between this network score and the model accuracy on NAS-Bench-101 and NATS-Bench-SSS/TSS.
arXiv Detail & Related papers (2024-08-16T14:38:14Z) - Demystifying Lazy Training of Neural Networks from a Macroscopic Viewpoint [5.9954962391837885]
We study the gradient descent dynamics of neural networks through the lens of macroscopic limits.
Our study reveals that gradient descent can rapidly drive deep neural networks to zero training loss.
Our approach draws inspiration from the Neural Tangent Kernel (NTK) paradigm.
arXiv Detail & Related papers (2024-04-07T08:07:02Z) - Epistemic Modeling Uncertainty of Rapid Neural Network Ensembles for
Adaptive Learning [0.0]
A new type of neural network is presented using the rapid neural network paradigm.
It is found that the proposed emulator embedded neural network trains near-instantaneously, typically without loss of prediction accuracy.
arXiv Detail & Related papers (2023-09-12T22:34:34Z) - Speed Limits for Deep Learning [67.69149326107103]
Recent advancement in thermodynamics allows bounding the speed at which one can go from the initial weight distribution to the final distribution of the fully trained network.
We provide analytical expressions for these speed limits for linear and linearizable neural networks.
Remarkably, given some plausible scaling assumptions on the NTK spectra and spectral decomposition of the labels -- learning is optimal in a scaling sense.
arXiv Detail & Related papers (2023-07-27T06:59:46Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - Does Preprocessing Help Training Over-parameterized Neural Networks? [19.64638346701198]
We propose two novel preprocessing ideas to bypass the $Omega(mnd)$ barrier.
Our results provide theoretical insights for a large number of previously established fast training methods.
arXiv Detail & Related papers (2021-10-09T18:16:23Z) - Training Graph Neural Networks by Graphon Estimation [2.5997274006052544]
We propose to train a graph neural network via resampling from a graphon estimate obtained from the underlying network data.
We show that our approach is competitive with and in many cases outperform the other over-smoothing reducing GNN training methods.
arXiv Detail & Related papers (2021-09-04T19:21:48Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Taylorized Training: Towards Better Approximation of Neural Network
Training at Finite Width [116.69845849754186]
Taylorized training involves training the $k$-th order Taylor expansion of the neural network.
We show that Taylorized training agrees with full neural network training increasingly better as we increase $k$.
We complement our experiments with theoretical results showing that the approximation error of $k$-th order Taylorized models decay exponentially over $k$ in wide neural networks.
arXiv Detail & Related papers (2020-02-10T18:37:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.