The Separation Capacity of Random Neural Networks
- URL: http://arxiv.org/abs/2108.00207v1
- Date: Sat, 31 Jul 2021 10:25:26 GMT
- Title: The Separation Capacity of Random Neural Networks
- Authors: Sjoerd Dirksen, Martin Genzel, Laurent Jacques, Alexander Stollenwerk
- Abstract summary: We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
- Score: 78.25060223808936
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural networks with random weights appear in a variety of machine learning
applications, most prominently as the initialization of many deep learning
algorithms and as a computationally cheap alternative to fully learned neural
networks. In the present article we enhance the theoretical understanding of
random neural nets by addressing the following data separation problem: under
what conditions can a random neural network make two classes $\mathcal{X}^-,
\mathcal{X}^+ \subset \mathbb{R}^d$ (with positive distance) linearly
separable? We show that a sufficiently large two-layer ReLU-network with
standard Gaussian weights and uniformly distributed biases can solve this
problem with high probability. Crucially, the number of required neurons is
explicitly linked to geometric properties of the underlying sets
$\mathcal{X}^-, \mathcal{X}^+$ and their mutual arrangement. This
instance-specific viewpoint allows us to overcome the usual curse of
dimensionality (exponential width of the layers) in non-pathological situations
where the data carries low-complexity structure. We quantify the relevant
structure of the data in terms of a novel notion of mutual complexity (based on
a localized version of Gaussian mean width), which leads to sound and
informative separation guarantees. We connect our result with related lines of
work on approximation, memorization, and generalization.
Related papers
- LinSATNet: The Positive Linear Satisfiability Neural Networks [116.65291739666303]
This paper studies how to introduce the popular positive linear satisfiability to neural networks.
We propose the first differentiable satisfiability layer based on an extension of the classic Sinkhorn algorithm for jointly encoding multiple sets of marginal distributions.
arXiv Detail & Related papers (2024-07-18T22:05:21Z) - Memorization with neural nets: going beyond the worst case [5.662924503089369]
In practice, deep neural networks are often able to easily interpolate their training data.
For real-world data, however, one intuitively expects the presence of a benign structure so that already occurs at a smaller network size than suggested by memorization capacity.
We introduce a simple randomized algorithm that, given a fixed finite dataset with two classes, with high probability constructs an interpolating three-layer neural network in time.
arXiv Detail & Related papers (2023-09-30T10:06:05Z) - Nonparametric regression using over-parameterized shallow ReLU neural networks [10.339057554827392]
We show that neural networks can achieve minimax optimal rates of convergence (up to logarithmic factors) for learning functions from certain smooth function classes.
It is assumed that the regression function is from the H"older space with smoothness $alpha(d+3)/2$ or a variation space corresponding to shallow neural networks.
As a byproduct, we derive a new size-independent bound for the local Rademacher complexity of shallow ReLU neural networks.
arXiv Detail & Related papers (2023-06-14T07:42:37Z) - Randomly Initialized One-Layer Neural Networks Make Data Linearly
Separable [1.2277343096128712]
Given sufficient width, a randomly one-layer neural network can transform two sets into two linearly separable sets without any training.
This paper contributes by establishing that, given sufficient width, a randomly one-layer neural network can transform two sets into two linearly separable sets without any training.
arXiv Detail & Related papers (2022-05-24T01:38:43Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Self-Regularity of Non-Negative Output Weights for Overparameterized
Two-Layer Neural Networks [16.64116123743938]
We consider the problem of finding a two-layer neural network with sigmoid, rectified linear unit (ReLU)
We then leverage our bounds to establish guarantees for such networks through emphfat-shattering dimension
Notably, our bounds also have good sample complexity (polynomials in $d$ with a low degree)
arXiv Detail & Related papers (2021-03-02T17:36:03Z) - Compressive Sensing and Neural Networks from a Statistical Learning
Perspective [4.561032960211816]
We present a generalization error analysis for a class of neural networks suitable for sparse reconstruction from few linear measurements.
Under realistic conditions, the generalization error scales only logarithmically in the number of layers, and at most linear in number of measurements.
arXiv Detail & Related papers (2020-10-29T15:05:43Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.