Related papers: Stochastic Neural Networks with Infinite Width are Deterministic

Stochastic Neural Networks with Infinite Width are Deterministic

URL: http://arxiv.org/abs/2201.12724v1
Date: Sun, 30 Jan 2022 04:52:31 GMT
Title: Stochastic Neural Networks with Infinite Width are Deterministic
Authors: Liu Ziyin, Hanlin Zhang, Xiangming Meng, Yuting Lu, Eric Xing, Masahito Ueda
Abstract summary: We study neural networks, a main type of neural network in use. We prove that as the width of an optimized neural network tends to infinity, its predictive variance on the training set decreases to zero.
Score: 7.07065078444922
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work theoretically studies stochastic neural networks, a main type of neural network in use. Specifically, we prove that as the width of an optimized stochastic neural network tends to infinity, its predictive variance on the training set decreases to zero. Two common examples that our theory applies to are neural networks with dropout and variational autoencoders. Our result helps better understand how stochasticity affects the learning of neural networks and thus design better architectures for practical problems.

Related papers

Stable Learning Using Spiking Neural Networks Equipped With Affine Encoders and Decoders [2.0072624123275533]
We study the learning problem associated with spiking neural networks. We focus on spiking neural networks composed of simple spiking neurons having only positive synaptic weights. In particular, we show in theory and simulations that affine spiking neural networks are capable of approximating shallow ReLU neural networks.
arXiv Detail & Related papers (2024-04-06T08:17:07Z)
Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters. Our approach enables a single model to encode neural computational graphs with diverse architectures. We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z)
Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence. We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers. This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z)
Benign Overfitting for Two-layer ReLU Convolutional Neural Networks [60.19739010031304]
We establish algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise. We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.
arXiv Detail & Related papers (2023-03-07T18:59:38Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
Permutation Equivariant Neural Functionals [92.0667671999604]
This work studies the design of neural networks that can process the weights or gradients of other neural networks. We focus on the permutation symmetries that arise in the weights of deep feedforward networks because hidden layer neurons have no inherent order. In our experiments, we find that permutation equivariant neural functionals are effective on a diverse set of tasks.
arXiv Detail & Related papers (2023-02-27T18:52:38Z)
Spiking neural network for nonlinear regression [68.8204255655161]
Spiking neural networks carry the potential for a massive reduction in memory and energy consumption. They introduce temporal and neuronal sparsity, which can be exploited by next-generation neuromorphic hardware. A framework for regression using spiking neural networks is proposed.
arXiv Detail & Related papers (2022-10-06T13:04:45Z)
Consistency of Neural Networks with Regularization [0.0]
This paper proposes the general framework of neural networks with regularization and prove its consistency. Two types of activation functions: hyperbolic function(Tanh) and rectified linear unit(ReLU) have been taken into consideration.
arXiv Detail & Related papers (2022-06-22T23:33:39Z)
Fourier Neural Networks for Function Approximation [2.840363325289377]
It is proved extensively that neural networks are universal approximators. It is specifically proved that for a narrow neural network to approximate a function which is otherwise implemented by a deep Neural network, the network take exponentially large number of neurons.
arXiv Detail & Related papers (2021-10-21T09:30:26Z)
Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function. We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z)
Deep Kronecker neural networks: A general framework for neural networks with adaptive activation functions [4.932130498861987]
We propose a new type of neural networks, Kronecker neural networks (KNNs), that form a general framework for neural networks with adaptive activation functions. Under suitable conditions, KNNs induce a faster decay of the loss than that by the feed-forward networks.
arXiv Detail & Related papers (2021-05-20T04:54:57Z)
Fast Adaptation with Linearized Neural Networks [35.43406281230279]
We study the inductive biases of linearizations of neural networks, which we show to be surprisingly good summaries of the full network functions. Inspired by this finding, we propose a technique for embedding these inductive biases into Gaussian processes through a kernel designed from the Jacobian of the network. In this setting, domain adaptation takes the form of interpretable posterior inference, with accompanying uncertainty estimation.
arXiv Detail & Related papers (2021-03-02T03:23:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.