Related papers: Randomly Initialized One-Layer Neural Networks Make Data Linearly Separable

Randomly Initialized One-Layer Neural Networks Make Data Linearly Separable

URL: http://arxiv.org/abs/2205.11716v2
Date: Sun, 8 Oct 2023 20:23:56 GMT
Title: Randomly Initialized One-Layer Neural Networks Make Data Linearly Separable
Authors: Promit Ghosal, Srinath Mahankali, Yihang Sun
Abstract summary: Given sufficient width, a randomly one-layer neural network can transform two sets into two linearly separable sets without any training. This paper contributes by establishing that, given sufficient width, a randomly one-layer neural network can transform two sets into two linearly separable sets without any training.
Score: 1.2277343096128712
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, neural networks have demonstrated remarkable capabilities in mapping two arbitrary sets to two linearly separable sets. The prospect of achieving this with randomly initialized neural networks is particularly appealing due to the computational efficiency compared to fully trained networks. This paper contributes by establishing that, given sufficient width, a randomly initialized one-layer neural network can, with high probability, transform two sets into two linearly separable sets without any training. Moreover, we furnish precise bounds on the necessary width of the neural network for this phenomenon to occur. Our initial bound exhibits exponential dependence on the input dimension while maintaining polynomial dependence on all other parameters. In contrast, our second bound is independent of input dimension, effectively surmounting the curse of dimensionality. The main tools used in our proof heavily relies on a fusion of geometric principles and concentration of random matrices.

Related papers

Compelling ReLU Networks to Exhibit Exponentially Many Linear Regions at Initialization and During Training [1.7205106391379021]
A neural network with ReLU activations may be viewed as a composition of piecewise linear functions. We introduce a novel training strategy that forces the network to exhibit a number of linear regions exponential in depth. This approach allows us to learn approximations of convex, one-dimensional functions that are several orders of magnitude more accurate than their randomly counterparts.
arXiv Detail & Related papers (2023-11-29T19:09:48Z)
Bayesian Interpolation with Deep Linear Networks [92.1721532941863]
Characterizing how neural network depth, width, and dataset size jointly impact model quality is a central problem in deep learning theory. We show that linear networks make provably optimal predictions at infinite depth. We also show that with data-agnostic priors, Bayesian model evidence in wide linear networks is maximized at infinite depth.
arXiv Detail & Related papers (2022-12-29T20:57:46Z)
Robust Training and Verification of Implicit Neural Networks: A Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks. We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network. We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z)
On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK) In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z)
Linear approximability of two-layer neural networks: A comprehensive analysis based on spectral decay [4.042159113348107]
We first consider the case of single neuron and show that the linear approximability, quantified by the Kolmogorov width, is controlled by the eigenvalue decay of an associate kernel. We show that similar results also hold for two-layer neural networks.
arXiv Detail & Related papers (2021-08-10T23:30:29Z)
The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability. We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z)
Vector-output ReLU Neural Network Problems are Copositive Programs: Convex Analysis of Two Layer Networks and Polynomial-time Algorithms [29.975118690758126]
We describe the semi-output global dual of the two-layer vector-infinite ReLU neural network training problem. We provide a solution which is guaranteed to be exact for certain classes of problems.
arXiv Detail & Related papers (2020-12-24T17:03:30Z)
Connecting Weighted Automata, Tensor Networks and Recurrent Neural Networks through Spectral Learning [58.14930566993063]
We present connections between three models used in different research fields: weighted finite automata(WFA) from formal languages and linguistics, recurrent neural networks used in machine learning, and tensor networks. We introduce the first provable learning algorithm for linear 2-RNN defined over sequences of continuous vectors input.
arXiv Detail & Related papers (2020-10-19T15:28:00Z)
Eigendecomposition-Free Training of Deep Networks for Linear Least-Square Problems [107.3868459697569]
We introduce an eigendecomposition-free approach to training a deep network. We show that our approach is much more robust than explicit differentiation of the eigendecomposition. Our method has better convergence properties and yields state-of-the-art results.
arXiv Detail & Related papers (2020-04-15T04:29:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.