Related papers: Mathematical Models of Overparameterized Neural Networks

Mathematical Models of Overparameterized Neural Networks

URL: http://arxiv.org/abs/2012.13982v1
Date: Sun, 27 Dec 2020 17:48:31 GMT
Title: Mathematical Models of Overparameterized Neural Networks
Authors: Cong Fang and Hanze Dong and Tong Zhang
Abstract summary: We will focus on the analysis of two-layer neural networks, and explain the key mathematical models. We will then discuss challenges in understanding deep neural networks and some current research directions.
Score: 25.329225766892126
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep learning has received considerable empirical successes in recent years. However, while many ad hoc tricks have been discovered by practitioners, until recently, there has been a lack of theoretical understanding for tricks invented in the deep learning literature. Known by practitioners that overparameterized neural networks are easy to learn, in the past few years there have been important theoretical developments in the analysis of overparameterized neural networks. In particular, it was shown that such systems behave like convex systems under various restricted settings, such as for two-layer NNs, and when learning is restricted locally in the so-called neural tangent kernel space around specialized initializations. This paper discusses some of these recent progresses leading to significant better understanding of neural networks. We will focus on the analysis of two-layer neural networks, and explain the key mathematical models, with their algorithmic implications. We will then discuss challenges in understanding deep neural networks and some current research directions.

Related papers

Emergent weight morphologies in deep neural networks [0.0]
We show that training deep neural networks gives rise to emergent weight morphologies independent of the training data. Our work demonstrates emergence in the training of deep neural networks, which impacts the achievable performance of deep neural networks.
arXiv Detail & Related papers (2025-01-09T19:48:51Z)
Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning. Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z)
Riemannian Residual Neural Networks [58.925132597945634]
We show how to extend the residual neural network (ResNet) ResNets have become ubiquitous in machine learning due to their beneficial learning properties, excellent empirical results, and easy-to-incorporate nature when building varied neural networks.
arXiv Detail & Related papers (2023-10-16T02:12:32Z)
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective [17.12783792226575]
We present a mathematical framework that characterizes the functional properties of neural networks that are trained to fit to data. Key mathematical tools which support this framework include transform-domain sparse regularization, the Radon transform of computed tomography, and approximation theory. This framework explains the effect of weight decay regularization in neural network training, the use of skip connections and low-rank weight matrices in network architectures, the role of sparsity in neural networks, and explains why neural networks can perform well in high-dimensional problems.
arXiv Detail & Related papers (2023-01-23T17:16:21Z)
Spiking neural network for nonlinear regression [68.8204255655161]
Spiking neural networks carry the potential for a massive reduction in memory and energy consumption. They introduce temporal and neuronal sparsity, which can be exploited by next-generation neuromorphic hardware. A framework for regression using spiking neural networks is proposed.
arXiv Detail & Related papers (2022-10-06T13:04:45Z)
Consistency of Neural Networks with Regularization [0.0]
This paper proposes the general framework of neural networks with regularization and prove its consistency. Two types of activation functions: hyperbolic function(Tanh) and rectified linear unit(ReLU) have been taken into consideration.
arXiv Detail & Related papers (2022-06-22T23:33:39Z)
Neural Tangent Kernel Analysis of Deep Narrow Neural Networks [11.623483126242478]
We present the first trainability guarantee of infinitely deep but narrow neural networks. We then extend the analysis to an infinitely deep convolutional neural network (CNN) and perform brief experiments.
arXiv Detail & Related papers (2022-02-07T07:27:02Z)
Dynamic Neural Diversification: Path to Computationally Sustainable Neural Networks [68.8204255655161]
Small neural networks with a constrained number of trainable parameters, can be suitable resource-efficient candidates for many simple tasks. We explore the diversity of the neurons within the hidden layer during the learning process. We analyze how the diversity of the neurons affects predictions of the model.
arXiv Detail & Related papers (2021-09-20T15:12:16Z)
What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization. We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks. Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z)
A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation. Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z)
Explainable artificial intelligence for mechanics: physics-informing neural networks for constitutive models [0.0]
In mechanics, the new and active field of physics-informed neural networks attempts to mitigate this disadvantage by designing deep neural networks on the basis of mechanical knowledge. We propose a first step towards a physics-forming-in approach, which explains neural networks trained on mechanical data a posteriori. Therein, the principal component analysis decorrelates the distributed representations in cell states of RNNs and allows the comparison to known and fundamental functions.
arXiv Detail & Related papers (2021-04-20T18:38:52Z)
On Interpretability of Artificial Neural Networks: A Survey [21.905647127437685]
We systematically review recent studies in understanding the mechanism of neural networks, describe applications of interpretability especially in medicine. We discuss future directions of interpretability research, such as in relation to fuzzy logic and brain science.
arXiv Detail & Related papers (2020-01-08T13:40:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.