Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks
- URL: http://arxiv.org/abs/2007.01452v1
- Date: Fri, 3 Jul 2020 01:37:16 GMT
- Title: Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks
- Authors: Cong Fang, Jason D. Lee, Pengkun Yang, Tong Zhang
- Abstract summary: This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
- Score: 54.27962244835622
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a new mean-field framework for over-parameterized deep
neural networks (DNNs), which can be used to analyze neural network training.
In this framework, a DNN is represented by probability measures and functions
over its features (that is, the function values of the hidden units over the
training data) in the continuous limit, instead of the neural network
parameters as most existing studies have done. This new representation
overcomes the degenerate situation where all the hidden units essentially have
only one meaningful hidden unit in each middle layer, and further leads to a
simpler representation of DNNs, for which the training objective can be
reformulated as a convex optimization problem via suitable re-parameterization.
Moreover, we construct a non-linear dynamics called neural feature flow, which
captures the evolution of an over-parameterized DNN trained by Gradient
Descent. We illustrate the framework via the standard DNN and the Residual
Network (Res-Net) architectures. Furthermore, we show, for Res-Net, when the
neural feature flow process converges, it reaches a global minimal solution
under suitable conditions. Our analysis leads to the first global convergence
proof for over-parameterized neural network training with more than $3$ layers
in the mean-field regime.
Related papers
- Local Linear Recovery Guarantee of Deep Neural Networks at Overparameterization [3.3998740964877463]
"Local linear recovery" (LLR) is a weaker form of target function recovery.
We prove that functions expressible by narrower DNNs are guaranteed to be recoverable from fewer samples than model parameters.
arXiv Detail & Related papers (2024-06-26T03:08:24Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters.
We find that our approach successfully generates parameters for a wide range of loss prompts.
We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z) - On Feature Learning in Neural Networks with Global Convergence
Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF)
We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF.
We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z) - An Optimal Time Variable Learning Framework for Deep Neural Networks [0.0]
The proposed framework can be applied to any of the existing networks such as ResNet, DenseNet or Fractional-DNN.
The proposed approach is applied to an ill-posed 3D-Maxwell's equation.
arXiv Detail & Related papers (2022-04-18T19:29:03Z) - A Kernel-Expanded Stochastic Neural Network [10.837308632004644]
Deep neural network often gets trapped into a local minimum in training.
New kernel-expanded neural network (K-StoNet) model reformulates the network as a latent variable model.
Model can be easily trained using the imputationregularized optimization (IRO) algorithm.
arXiv Detail & Related papers (2022-01-14T06:42:42Z) - Neural Capacitance: A New Perspective of Neural Network Selection via
Edge Dynamics [85.31710759801705]
Current practice requires expensive computational costs in model training for performance prediction.
We propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training.
Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections.
arXiv Detail & Related papers (2022-01-11T20:53:15Z) - Training Integrable Parameterizations of Deep Neural Networks in the
Infinite-Width Limit [0.0]
Large-width dynamics has emerged as a fruitful viewpoint and led to practical insights on real-world deep networks.
For two-layer neural networks, it has been understood that the nature of the trained model radically changes depending on the scale of the initial random weights.
We propose various methods to avoid this trivial behavior and analyze in detail the resulting dynamics.
arXiv Detail & Related papers (2021-10-29T07:53:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.