Capacity of the Hebbian-Hopfield network associative memory
- URL: http://arxiv.org/abs/2403.01907v1
- Date: Mon, 4 Mar 2024 10:10:23 GMT
- Title: Capacity of the Hebbian-Hopfield network associative memory
- Authors: Mihailo Stojnic
- Abstract summary: In citeHop82, Hopfield introduced a emphHebbian learning rule based neural network model and suggested how it can efficiently operate as an associative memory.
We study two famous pattern's basins of attraction: textbfemph(i) The AGS one from citeAmiGutSom85; and textbfemph(ii) The NLT one from citeNewman88,Louk94,Louk94a,Louk97,Tal
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In \cite{Hop82}, Hopfield introduced a \emph{Hebbian} learning rule based
neural network model and suggested how it can efficiently operate as an
associative memory. Studying random binary patterns, he also uncovered that, if
a small fraction of errors is tolerated in the stored patterns retrieval, the
capacity of the network (maximal number of memorized patterns, $m$) scales
linearly with each pattern's size, $n$. Moreover, he famously predicted
$\alpha_c=\lim_{n\rightarrow\infty}\frac{m}{n}\approx 0.14$. We study this very
same scenario with two famous pattern's basins of attraction:
\textbf{\emph{(i)}} The AGS one from \cite{AmiGutSom85}; and
\textbf{\emph{(ii)}} The NLT one from
\cite{Newman88,Louk94,Louk94a,Louk97,Tal98}. Relying on the \emph{fully lifted
random duality theory} (fl RDT) from \cite{Stojnicflrdt23}, we obtain the
following explicit capacity characterizations on the first level of lifting:
\begin{equation}
\alpha_c^{(AGS,1)} = \left ( \max_{\delta\in \left ( 0,\frac{1}{2}\right )
}\frac{1-2\delta}{\sqrt{2} \mbox{erfinv} \left ( 1-2\delta\right )} -
\frac{2}{\sqrt{2\pi}} e^{-\left ( \mbox{erfinv}\left ( 1-2\delta \right )\right
)^2}\right )^2 \approx \mathbf{0.137906} \end{equation}
\begin{equation}
\alpha_c^{(NLT,1)} = \frac{\mbox{erf}(x)^2}{2x^2}-1+\mbox{erf}(x)^2 \approx
\mathbf{0.129490}, \quad 1-\mbox{erf}(x)^2-
\frac{2\mbox{erf}(x)e^{-x^2}}{\sqrt{\pi}x}+\frac{2e^{-2x^2}}{\pi}=0.
\end{equation}
A substantial numerical work gives on the second level of lifting
$\alpha_c^{(AGS,2)} \approx \mathbf{0.138186}$ and $\alpha_c^{(NLT,2)} \approx
\mathbf{0.12979}$, effectively uncovering a remarkably fast lifting
convergence. Moreover, the obtained AGS characterizations exactly match the
replica symmetry based ones of \cite{AmiGutSom85} and the corresponding
symmetry breaking ones of \cite{SteKuh94}.
Related papers
- On the quantum Guerra-Morato Action Functional [0.0]
Given a smooth potential $W:mathrmTn to mathbbR$ on the torus, the Quantum Guerra-Morato action functional is given by smallskip.
We show that the expression for the second variation of a critical solution is given by smallskip.
arXiv Detail & Related papers (2024-03-09T10:30:21Z) - Exact objectives of random linear programs and mean widths of random
polyhedrons [0.0]
We consider emphrandom linear programs (rlps) as a subclass of emphrandom optimization problems (rops)
Our particular focus is on appropriate linear objectives which connect the rlps to the mean widths of random polyhedrons/polytopes.
arXiv Detail & Related papers (2024-03-06T11:51:52Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm [59.65871549878937]
This paper considers the RMSProp and its momentum extension and establishes the convergence rate of $frac1Tsum_k=1T.
Our convergence rate matches the lower bound with respect to all the coefficients except the dimension $d$.
Our convergence rate can be considered to be analogous to the $frac1Tsum_k=1T.
arXiv Detail & Related papers (2024-02-01T07:21:32Z) - Convergence Rates of Stochastic Zeroth-order Gradient Descent for \L
ojasiewicz Functions [6.137707924685666]
We prove convergence rates of Zeroth-order Gradient Descent (SZGD) algorithms for Lojasiewicz functions.
Our results show that $ f (mathbfx_t) - f (mathbfx_infty) _t in mathbbN $ can converge faster than $ | mathbfx_infty.
arXiv Detail & Related papers (2022-10-31T00:53:17Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Low-degree learning and the metric entropy of polynomials [49.1574468325115]
We prove that any (deterministic or randomized) algorithm which learns $mathscrF_nd$ with $L$-accuracy $varepsilon$ requires at least $Omega(sqrtvarepsilon)2dlog n leq log mathsfM(mathscrF_n,d,|cdot|_L,varepsilon) satisfies the two-sided estimate $$c (1-varepsilon)2dlog
arXiv Detail & Related papers (2022-03-17T23:52:08Z) - On the Self-Penalization Phenomenon in Feature Selection [69.16452769334367]
We describe an implicit sparsity-inducing mechanism based on over a family of kernels.
As an application, we use this sparsity-inducing mechanism to build algorithms consistent for feature selection.
arXiv Detail & Related papers (2021-10-12T09:36:41Z) - DIPPA: An improved Method for Bilinear Saddle Point Problems [18.65143269806133]
This paper deals with point dependency problems $min_bfx max_bfy g(fracx) + bfxtop bfbftop fracbfA kappa_x kappa_x (kappa_x + kappa_y) kappa_y (kappa_x + kappa_y) kappa_y (kappa_x + kappa_y)
arXiv Detail & Related papers (2021-03-15T10:55:30Z) - Optimal Mean Estimation without a Variance [103.26777953032537]
We study the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist.
We design an estimator which attains the smallest possible confidence interval as a function of $n,d,delta$.
arXiv Detail & Related papers (2020-11-24T22:39:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.