Related papers: Statistical Guarantees for Approximate Stationary Points of Simple Neural Networks

Statistical Guarantees for Approximate Stationary Points of Simple Neural Networks

URL: http://arxiv.org/abs/2205.04491v1
Date: Mon, 9 May 2022 18:09:04 GMT
Title: Statistical Guarantees for Approximate Stationary Points of Simple Neural Networks
Authors: Mahsa Taheri, Fang Xie, Johannes Lederer
Abstract summary: We develop statistical guarantees for simple neural networks that coincide up to logarithmic factors with the global optima. We make a step forward in describing the practical properties of neural networks in mathematical terms.
Score: 4.254099382808598
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Since statistical guarantees for neural networks are usually restricted to global optima of intricate objective functions, it is not clear whether these theories really explain the performances of actual outputs of neural-network pipelines. The goal of this paper is, therefore, to bring statistical theory closer to practice. We develop statistical guarantees for simple neural networks that coincide up to logarithmic factors with the global optima but apply to stationary points and the points nearby. These results support the common notion that neural networks do not necessarily need to be optimized globally from a mathematical perspective. More generally, despite being limited to simple neural networks for now, our theories make a step forward in describing the practical properties of neural networks in mathematical terms.

Related papers

Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization [66.03821840425539]
In this paper, we investigate the training dynamics of $L$-layer neural networks using the tensor gradient program (SGD) framework. We show that SGD enables these networks to learn linearly independent features that substantially deviate from their initial values. This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum.
arXiv Detail & Related papers (2025-03-12T17:33:13Z)
Interpreting Neural Networks through Mahalanobis Distance [0.0]
This paper introduces a theoretical framework that connects neural network linear layers with the Mahalanobis distance. Although this work is theoretical and does not include empirical data, the proposed distance-based interpretation has the potential to enhance model robustness, improve generalization, and provide more intuitive explanations of neural network decisions.
arXiv Detail & Related papers (2024-10-25T07:21:44Z)
Jaynes Machine: The universal microstructure of deep neural networks [0.9086679566009702]
We predict that all highly connected layers of deep neural networks have a universal microstructure of connection strengths that is distributed lognormally ($LN(mu, sigma)$). Under ideal conditions, the theory predicts that $mu$ and $sigma$ are the same for all layers in all networks.
arXiv Detail & Related papers (2023-10-10T19:22:01Z)
Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence. We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers. This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z)
Correlation between entropy and generalizability in a neural network [9.223853439465582]
We use Wang-Landau Mote Carlo algorithm to calculate the entropy at a given test accuracy. Our results show that entropical forces help generalizability.
arXiv Detail & Related papers (2022-07-05T12:28:13Z)
Consistency of Neural Networks with Regularization [0.0]
This paper proposes the general framework of neural networks with regularization and prove its consistency. Two types of activation functions: hyperbolic function(Tanh) and rectified linear unit(ReLU) have been taken into consideration.
arXiv Detail & Related papers (2022-06-22T23:33:39Z)
Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function. We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z)
What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization. We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks. Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z)
Provably Training Neural Network Classifiers under Fairness Constraints [70.64045590577318]
We show that overparametrized neural networks could meet the constraints. Key ingredient of building a fair neural network classifier is establishing no-regret analysis for neural networks.
arXiv Detail & Related papers (2020-12-30T18:46:50Z)
How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution. Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z)
Generalization bound of globally optimal non-convex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics [50.83356836818667]
We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error. Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence.
arXiv Detail & Related papers (2020-07-11T18:19:50Z)
Statistical Guarantees for Regularized Neural Networks [4.254099382808598]
We develop a general statistical guarantee for estimators that consist of a least-squares term and a regularizer. Our results establish a mathematical basis for regularized estimation of neural networks.
arXiv Detail & Related papers (2020-05-30T15:28:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.