Proximal Mean Field Learning in Shallow Neural Networks
- URL: http://arxiv.org/abs/2210.13879v3
- Date: Sat, 16 Dec 2023 00:41:06 GMT
- Title: Proximal Mean Field Learning in Shallow Neural Networks
- Authors: Alexis Teter, Iman Nodozi, Abhishek Halder
- Abstract summary: We propose a custom learning algorithm for shallow neural networks with single hidden layer having infinite width.
We realize mean field learning as a computational algorithm, rather than as an analytical tool.
Our algorithm performs gradient descent of the free energy associated with the risk functional.
- Score: 0.4972323953932129
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a custom learning algorithm for shallow over-parameterized neural
networks, i.e., networks with single hidden layer having infinite width. The
infinite width of the hidden layer serves as an abstraction for the
over-parameterization. Building on the recent mean field interpretations of
learning dynamics in shallow neural networks, we realize mean field learning as
a computational algorithm, rather than as an analytical tool. Specifically, we
design a Sinkhorn regularized proximal algorithm to approximate the
distributional flow for the learning dynamics over weighted point clouds. In
this setting, a contractive fixed point recursion computes the time-varying
weights, numerically realizing the interacting Wasserstein gradient flow of the
parameter distribution supported over the neuronal ensemble. An appealing
aspect of the proposed algorithm is that the measure-valued recursions allow
meshless computation. We demonstrate the proposed computational framework of
interacting weighted particle evolution on binary and multi-class
classification. Our algorithm performs gradient descent of the free energy
associated with the risk functional.
Related papers
- A Unified Framework for Neural Computation and Learning Over Time [56.44910327178975]
Hamiltonian Learning is a novel unified framework for learning with neural networks "over time"
It is based on differential equations that: (i) can be integrated without the need of external software solvers; (ii) generalize the well-established notion of gradient-based learning in feed-forward and recurrent networks; (iii) open to novel perspectives.
arXiv Detail & Related papers (2024-09-18T14:57:13Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Enhanced quantum state preparation via stochastic prediction of neural
network [0.8287206589886881]
In this paper, we explore an intriguing avenue for enhancing algorithm effectiveness through exploiting the knowledge blindness of neural network.
Our approach centers around a machine learning algorithm utilized for preparing arbitrary quantum states in a semiconductor double quantum dot system.
By leveraging prediction generated by the neural network, we are able to guide the optimization process to escape local optima.
arXiv Detail & Related papers (2023-07-27T09:11:53Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Scalable computation of prediction intervals for neural networks via
matrix sketching [79.44177623781043]
Existing algorithms for uncertainty estimation require modifying the model architecture and training procedure.
This work proposes a new algorithm that can be applied to a given trained neural network and produces approximate prediction intervals.
arXiv Detail & Related papers (2022-05-06T13:18:31Z) - Analytically Tractable Inference in Deep Neural Networks [0.0]
Tractable Approximate Inference (TAGI) algorithm was shown to be a viable and scalable alternative to backpropagation for shallow fully-connected neural networks.
We are demonstrating how TAGI matches or exceeds the performance of backpropagation, for training classic deep neural network architectures.
arXiv Detail & Related papers (2021-03-09T14:51:34Z) - Local Extreme Learning Machines and Domain Decomposition for Solving
Linear and Nonlinear Partial Differential Equations [0.0]
We present a neural network-based method for solving linear and nonlinear partial differential equations.
The method combines the ideas of extreme learning machines (ELM), domain decomposition and local neural networks.
We compare the current method with the deep Galerkin method (DGM) and the physics-informed neural network (PINN) in terms of the accuracy and computational cost.
arXiv Detail & Related papers (2020-12-04T23:19:39Z) - Event-Based Backpropagation can compute Exact Gradients for Spiking
Neural Networks [0.0]
Spiking neural networks combine analog computation with event-based communication using discrete spikes.
For the first time, this work derives the backpropagation algorithm for a continuous-time spiking neural network and a general loss function.
We use gradients computed via EventProp to train networks on the Yin-Yang and MNIST datasets using either a spike time or voltage based loss function and report competitive performance.
arXiv Detail & Related papers (2020-09-17T15:45:00Z) - A Shooting Formulation of Deep Learning [19.51427735087011]
We introduce a shooting formulation which shifts the perspective from parameterizing a network layer-by-layer to parameterizing over optimal networks.
For scalability, we propose a novel particle-ensemble parametrization which fully specifies the optimal weight trajectory of the continuous-depth neural network.
arXiv Detail & Related papers (2020-06-18T07:36:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.