Variational Inference for Infinitely Deep Neural Networks
- URL: http://arxiv.org/abs/2209.10091v1
- Date: Wed, 21 Sep 2022 03:54:34 GMT
- Title: Variational Inference for Infinitely Deep Neural Networks
- Authors: Achille Nazaret, David Blei
- Abstract summary: unbounded depth neural network (UDN)
We introduce the unbounded depth neural network (UDN), an infinitely deep probabilistic model that adapts its complexity to the training data.
We study the UDN on real and synthetic data.
- Score: 0.4061135251278187
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce the unbounded depth neural network (UDN), an infinitely deep
probabilistic model that adapts its complexity to the training data. The UDN
contains an infinite sequence of hidden layers and places an unbounded prior on
a truncation L, the layer from which it produces its data. Given a dataset of
observations, the posterior UDN provides a conditional distribution of both the
parameters of the infinite neural network and its truncation. We develop a
novel variational inference algorithm to approximate this posterior, optimizing
a distribution of the neural network weights and of the truncation depth L, and
without any upper limit on L. To this end, the variational family has a special
structure: it models neural network weights of arbitrary depth, and it
dynamically creates or removes free variational parameters as its distribution
of the truncation is optimized. (Unlike heuristic approaches to model search,
it is solely through gradient-based optimization that this algorithm explores
the space of truncations.) We study the UDN on real and synthetic data. We find
that the UDN adapts its posterior depth to the dataset complexity; it
outperforms standard neural networks of similar computational complexity; and
it outperforms other approaches to infinite-depth neural networks.
Related papers
- Scalable Bayesian Inference in the Era of Deep Learning: From Gaussian Processes to Deep Neural Networks [0.5827521884806072]
Large neural networks trained on large datasets have become the dominant paradigm in machine learning.
This thesis develops scalable methods to equip neural networks with model uncertainty.
arXiv Detail & Related papers (2024-04-29T23:38:58Z) - Deeper or Wider: A Perspective from Optimal Generalization Error with Sobolev Loss [2.07180164747172]
We compare deeper neural networks (DeNNs) with a flexible number of layers and wider neural networks (WeNNs) with limited hidden layers.
We find that a higher number of parameters tends to favor WeNNs, while an increased number of sample points and greater regularity in the loss function lean towards the adoption of DeNNs.
arXiv Detail & Related papers (2024-01-31T20:10:10Z) - Spike-and-slab shrinkage priors for structurally sparse Bayesian neural networks [0.16385815610837165]
Sparse deep learning addresses challenges by recovering a sparse representation of the underlying target function.
Deep neural architectures compressed via structured sparsity provide low latency inference, higher data throughput, and reduced energy consumption.
We propose structurally sparse Bayesian neural networks which prune excessive nodes with (i) Spike-and-Slab Group Lasso (SS-GL), and (ii) Spike-and-Slab Group Horseshoe (SS-GHS) priors.
arXiv Detail & Related papers (2023-08-17T17:14:18Z) - Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence.
We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers.
This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z) - Over-parameterised Shallow Neural Networks with Asymmetrical Node
Scaling: Global Convergence Guarantees and Feature Learning [23.47570704524471]
We consider optimisation of large and shallow neural networks via gradient flow, where the output of each hidden node is scaled by some positive parameter.
We prove that, for large neural networks, with high probability, gradient flow converges to a global minimum AND can learn features, unlike in the NTK regime.
arXiv Detail & Related papers (2023-02-02T10:40:06Z) - Bayesian Interpolation with Deep Linear Networks [92.1721532941863]
Characterizing how neural network depth, width, and dataset size jointly impact model quality is a central problem in deep learning theory.
We show that linear networks make provably optimal predictions at infinite depth.
We also show that with data-agnostic priors, Bayesian model evidence in wide linear networks is maximized at infinite depth.
arXiv Detail & Related papers (2022-12-29T20:57:46Z) - Non-Gradient Manifold Neural Network [79.44066256794187]
Deep neural network (DNN) generally takes thousands of iterations to optimize via gradient descent.
We propose a novel manifold neural network based on non-gradient optimization.
arXiv Detail & Related papers (2021-06-15T06:39:13Z) - Convergence rates for gradient descent in the training of
overparameterized artificial neural networks with biases [3.198144010381572]
In recent years, artificial neural networks have developed into a powerful tool for dealing with a multitude of problems for which classical solution approaches.
It is still unclear why randomly gradient descent algorithms reach their limits.
arXiv Detail & Related papers (2021-02-23T18:17:47Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Rectified Linear Postsynaptic Potential Function for Backpropagation in
Deep Spiking Neural Networks [55.0627904986664]
Spiking Neural Networks (SNNs) usetemporal spike patterns to represent and transmit information, which is not only biologically realistic but also suitable for ultra-low-power event-driven neuromorphic implementation.
This paper investigates the contribution of spike timing dynamics to information encoding, synaptic plasticity and decision making, providing a new perspective to design of future DeepSNNs and neuromorphic hardware systems.
arXiv Detail & Related papers (2020-03-26T11:13:07Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.