Binary perceptron computational gap -- a parametric fl RDT view
- URL: http://arxiv.org/abs/2511.01037v1
- Date: Sun, 02 Nov 2025 18:23:49 GMT
- Title: Binary perceptron computational gap -- a parametric fl RDT view
- Authors: Mihailo Stojnic,
- Abstract summary: asymmetric binary perceptron (ABP) likely exhibits two phase transitioning constraint density thresholds.<n>We consider a particular parametric utilization of emphfully lifted random duality theory (fl RDT) [85] and study its potential ABP's algorithmic implications.
- Score: 2.538209532048867
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent studies suggest that asymmetric binary perceptron (ABP) likely exhibits the so-called statistical-computational gap characterized with the appearance of two phase transitioning constraint density thresholds: \textbf{\emph{(i)}} the \emph{satisfiability threshold} $\alpha_c$, below/above which ABP succeeds/fails to operate as a storage memory; and \textbf{\emph{(ii)}} \emph{algorithmic threshold} $\alpha_a$, below/above which one can/cannot efficiently determine ABP's weight so that it operates as a storage memory. We consider a particular parametric utilization of \emph{fully lifted random duality theory} (fl RDT) [85] and study its potential ABP's algorithmic implications. A remarkable structural parametric change is uncovered as one progresses through fl RDT lifting levels. On the first two levels, the so-called $\c$ sequence -- a key parametric fl RDT component -- is of the (natural) decreasing type. A change of such phenomenology on higher levels is then connected to the $\alpha_c$ -- $\alpha_a$ threshold change. Namely, on the second level concrete numerical values give for the critical constraint density $\alpha=\alpha_c\approx 0.8331$. While progressing through higher levels decreases this estimate, already on the fifth level we observe a satisfactory level of convergence and obtain $\alpha\approx 0.7764$. This allows to draw two striking parallels: \textbf{\emph{(i)}} the obtained constraint density estimate is in a remarkable agrement with range $\alpha\in (0.77,0.78)$ of clustering defragmentation (believed to be responsible for failure of locally improving algorithms) [17,88]; and \textbf{\emph{(ii)}} the observed change of $\c$ sequence phenomenology closely matches the one of the negative Hopfield model for which the existence of efficient algorithms that closely approach similar type of threshold has been demonstrated recently [87].
Related papers
- Parametric RDT approach to computational gap of symmetric binary perceptron [2.538209532048867]
We study potential presence of statisticalcomputational gaps (SCG) in symmetric binary perceptrons (SBP) via a parametric utilization of emphfully lifted random duality theory.<n>A structural change from decreasingly to arbitrarily ordered $c$-sequence is observed on the second lifting level and associated with emphsatisfiability.
arXiv Detail & Related papers (2026-01-15T17:48:58Z) - Theoretical Framework for Tempered Fractional Gradient Descent: Application to Breast Cancer Classification [0.0]
This paper introduces Fractional Gradient Descent (TFGD), a novel optimization framework that synergizes fractional calculus with exponential tempering to enhance gradient-based learning.<n>TFGD addresses limitations by incorporating a tempered memory mechanism, where historical gradients are weighted by fractional coefficients $|w_j| = binomalphaj$ and exponentially decayed via a tempering parameter $lambda$.<n> Empirical validation on the Breast Cancer Wisconsin dataset demonstrates TFGD's superiority, achieving 98.25% test accuracy (vs. 92.11% for SGD) and 2$times$ faster convergence.
arXiv Detail & Related papers (2025-04-26T08:26:34Z) - A Nearly Optimal Single Loop Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness [15.656614304616006]
This paper studies the problem of bilevel optimization where the upper-level function is nonstationary with potentially unbounded smoothness and the lower-level function is convex.<n>Existing algorithm relies on a nested loop, which crucially requires significant tuning efforts and is not practical.
arXiv Detail & Related papers (2024-12-28T04:40:27Z) - Beyond likelihood ratio bias: Nested multi-time-scale stochastic approximation for likelihood-free parameter estimation [49.78792404811239]
We study inference in simulation-based models where the analytical form of the likelihood is unknown.<n>We use a ratio-free nested multi-time-scale approximation (SA) method that simultaneously tracks the score and drives the parameter update.<n>We show that our algorithm can eliminate the original bias $Obig(sqrtfrac1Nbig)$ and accelerate the convergence rate from $Obig(beta_k+sqrtfracalpha_kNbig)$.
arXiv Detail & Related papers (2024-11-20T02:46:15Z) - From Gradient Clipping to Normalization for Heavy Tailed SGD [19.369399536643773]
Recent empirical evidence indicates that machine learning applications involve heavy-tailed noise, which challenges the standard assumptions of bounded variance in practice.<n>In this paper, we show that it is possible to achieve tightness of the gradient-dependent noise convergence problem under tailed noise.
arXiv Detail & Related papers (2024-10-17T17:59:01Z) - Theoretical limits of descending $\ell_0$ sparse-regression ML algorithms [0.0]
We develop a generic analytical program for studying performance of the emphmaximum-likelihood (ML) decoding.
Key ML performance parameter, the residual emphroot mean square error ($textbfRMSE$) is uncovered to exhibit the so-called emphphase-transition (PT) phenomenon.
Concrete implementation and practical relevance of the Fl RDT typically rely on the ability to conduct a sizeable set of the underlying numerical evaluations.
arXiv Detail & Related papers (2024-10-10T06:33:41Z) - An Accelerated Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness [15.656614304616006]
This paper investigates a class of bilevel optimization problems where the upper-level function is non convex and the lower-level problem is strongly convex.<n>These problems have significant applications in data learning, such as text classification using unbounded networks.<n>We propose a new Accelerated Bilevel Optimization algorithm named AccBO.
arXiv Detail & Related papers (2024-09-28T02:30:44Z) - Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training.
Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z) - Stochastic Gradient Succeeds for Bandits [64.17904367852563]
We show that the emphstochastic gradient bandit algorithm converges to a emphglobally optimal policy at an $O (1/t)$ rate.
Remarkably, global convergence of the gradient bandit algorithm has not been previously established.
arXiv Detail & Related papers (2024-02-27T06:05:01Z) - Asynchronous Training Schemes in Distributed Learning with Time Delay [17.259708772713164]
In the context of distributed deep learning, the issue of stale weights or gradients could result in poor algorithmic performance.
In this paper, we propose a different approach to tackle the issue of stale weights or gradients.
One practical variant of PC-ASGD is also proposed by adopting a condition to help with the determination of the tradeoff parameter.
arXiv Detail & Related papers (2022-08-28T07:14:59Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - Rate-matching the regret lower-bound in the linear quadratic regulator
with unknown dynamics [6.287145010885044]
This paper establishes a novel regret upper-bound of $O_p(sqrtT,textpolylog(T))$.
It simultaneously establishes an estimation error bound on the dynamics of $O_p(sqrtT,textpolylog(T))$.
arXiv Detail & Related papers (2022-02-11T17:50:14Z) - Differentially Quantized Gradient Methods [53.3186247068836]
We show that Differentially Quantized Gradient Descent (DQ-GD) attains a linear contraction factor of $maxsigma_mathrmGD, rhon 2-R$.
No algorithm within a certain class can converge faster than $maxsigma_mathrmGD, 2-R$.
arXiv Detail & Related papers (2020-02-06T20:40:53Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.