A theory of capacity and sparse neural encoding
- URL: http://arxiv.org/abs/2102.10148v1
- Date: Fri, 19 Feb 2021 20:24:50 GMT
- Title: A theory of capacity and sparse neural encoding
- Authors: Pierre Baldi, Roman Vershynin
- Abstract summary: We study sparse neural maps from an input layer to a target layer with sparse activity.
We mathematically prove that $K undergoes a phase transition and that in general, sparsity in the target layers increases the storage capacity of the map.
- Score: 15.000818334408805
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Motivated by biological considerations, we study sparse neural maps from an
input layer to a target layer with sparse activity, and specifically the
problem of storing $K$ input-target associations $(x,y)$, or memories, when the
target vectors $y$ are sparse. We mathematically prove that $K$ undergoes a
phase transition and that in general, and somewhat paradoxically, sparsity in
the target layers increases the storage capacity of the map. The target vectors
can be chosen arbitrarily, including in random fashion, and the memories can be
both encoded and decoded by networks trained using local learning rules,
including the simple Hebb rule. These results are robust under a variety of
statistical assumptions on the data. The proofs rely on elegant properties of
random polytopes and sub-gaussian random vector variables. Open problems and
connections to capacity theories and polynomial threshold maps are discussed.
Related papers
- Sliding down the stairs: how correlated latent variables accelerate learning with neural networks [8.107431208836426]
We show that correlations between latent variables along directions encoded in different input cumulants speed up learning from higher-order correlations.
Our results are confirmed in simulations of two-layer neural networks.
arXiv Detail & Related papers (2024-04-12T17:01:25Z) - Delta-AI: Local objectives for amortized inference in sparse graphical models [64.5938437823851]
We present a new algorithm for amortized inference in sparse probabilistic graphical models (PGMs)
Our approach is based on the observation that when the sampling of variables in a PGM is seen as a sequence of actions taken by an agent, sparsity of the PGM enables local credit assignment in the agent's policy learning objective.
We illustrate $Delta$-AI's effectiveness for sampling from synthetic PGMs and training latent variable models with sparse factor structure.
arXiv Detail & Related papers (2023-10-03T20:37:03Z) - Learning Narrow One-Hidden-Layer ReLU Networks [30.63086568341548]
First-time algorithm succeeds whenever $k$ is a constant.
We use a multi-scale analysis to argue that sufficiently close neurons can be collapsed together.
arXiv Detail & Related papers (2023-04-20T17:53:09Z) - On the Identifiability and Estimation of Causal Location-Scale Noise
Models [122.65417012597754]
We study the class of location-scale or heteroscedastic noise models (LSNMs)
We show the causal direction is identifiable up to some pathological cases.
We propose two estimators for LSNMs: an estimator based on (non-linear) feature maps, and one based on neural networks.
arXiv Detail & Related papers (2022-10-13T17:18:59Z) - CARD: Classification and Regression Diffusion Models [51.0421331214229]
We introduce classification and regression diffusion (CARD) models, which combine a conditional generative model and a pre-trained conditional mean estimator.
We demonstrate the outstanding ability of CARD in conditional distribution prediction with both toy examples and real-world datasets.
arXiv Detail & Related papers (2022-06-15T03:30:38Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Locality defeats the curse of dimensionality in convolutional
teacher-student scenarios [69.2027612631023]
We show that locality is key in determining the learning curve exponent $beta$.
We conclude by proving, using a natural assumption, that performing kernel regression with a ridge that decreases with the size of the training set leads to similar learning curve exponents to those we obtain in the ridgeless case.
arXiv Detail & Related papers (2021-06-16T08:27:31Z) - Fundamental tradeoffs between memorization and robustness in random
features and neural tangent regimes [15.76663241036412]
We prove for a large class of activation functions that, if the model memorizes even a fraction of the training, then its Sobolev-seminorm is lower-bounded.
Experiments reveal for the first time, (iv) a multiple-descent phenomenon in the robustness of the min-norm interpolator.
arXiv Detail & Related papers (2021-06-04T17:52:50Z) - Binary autoencoder with random binary weights [0.0]
It is shown that the sparse activation of the hidden layer arises naturally in order to preserve information between layers.
With a large enough hidden layer, it is possible to get zero reconstruction error for any input just by varying the thresholds of neurons.
The model is similar to an olfactory perception system of a fruit fly, and the presented theoretical results give useful insights toward understanding more complex neural networks.
arXiv Detail & Related papers (2020-04-30T12:13:19Z) - A Neural Scaling Law from the Dimension of the Data Manifold [8.656787568717252]
When data is plentiful, the loss achieved by well-trained neural networks scales as a power-law $L propto N-alpha$ in the number of network parameters $N$.
The scaling law can be explained if neural models are effectively just performing regression on a data manifold of intrinsic dimension $d$.
This simple theory predicts that the scaling exponents $alpha approx 4/d$ for cross-entropy and mean-squared error losses.
arXiv Detail & Related papers (2020-04-22T19:16:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.