High-dimensional optimization for multi-spiked tensor PCA
- URL: http://arxiv.org/abs/2408.06401v1
- Date: Mon, 12 Aug 2024 12:09:25 GMT
- Title: High-dimensional optimization for multi-spiked tensor PCA
- Authors: Gérard Ben Arous, Cédric Gerbelot, Vanessa Piccolo,
- Abstract summary: We study the dynamics of two local optimization algorithms, online gradient descent (SGD) and gradient flow.
For gradient flow, we show that the algorithmic threshold to efficiently recover the first spike is also of order $Np-2$.
Our results are obtained through a detailed analysis of a low-dimensional system that describes the evolution of the correlations between the estimators and the spikes.
- Score: 8.435118770300999
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the dynamics of two local optimization algorithms, online stochastic gradient descent (SGD) and gradient flow, within the framework of the multi-spiked tensor model in the high-dimensional regime. This multi-index model arises from the tensor principal component analysis (PCA) problem, which aims to infer $r$ unknown, orthogonal signal vectors within the $N$-dimensional unit sphere through maximum likelihood estimation from noisy observations of an order-$p$ tensor. We determine the number of samples and the conditions on the signal-to-noise ratios (SNRs) required to efficiently recover the unknown spikes from natural initializations. Specifically, we distinguish between three types of recovery: exact recovery of each spike, recovery of a permutation of all spikes, and recovery of the correct subspace spanned by the signal vectors. We show that with online SGD, it is possible to recover all spikes provided a number of sample scaling as $N^{p-2}$, aligning with the computational threshold identified in the rank-one tensor PCA problem [Ben Arous, Gheissari, Jagannath 2020, 2021]. For gradient flow, we show that the algorithmic threshold to efficiently recover the first spike is also of order $N^{p-2}$. However, recovering the subsequent directions requires the number of samples to scale as $N^{p-1}$. Our results are obtained through a detailed analysis of a low-dimensional system that describes the evolution of the correlations between the estimators and the spikes. In particular, the hidden vectors are recovered one by one according to a sequential elimination phenomenon: as one correlation exceeds a critical threshold, all correlations sharing a row or column index decrease and become negligible, allowing the subsequent correlation to grow and become macroscopic. The sequence in which correlations become macroscopic depends on their initial values and on the associated SNRs.
Related papers
- Guaranteed Nonconvex Low-Rank Tensor Estimation via Scaled Gradient Descent [4.123899820318987]
This paper develops a scaled gradient descent (ScaledGD) algorithm to directly estimate the tensor factors.
In theory, ScaledGD achieves linear convergence at a constant rate that is independent of the condition number of the ground truth low-rank tensor.
numerical examples are provided to demonstrate the efficacy of ScaledGD in accelerating the convergence rate of ill-conditioned low-rank tensor estimation.
arXiv Detail & Related papers (2025-01-03T08:26:01Z) - Permutation recovery of spikes in noisy high-dimensional tensor estimation [8.435118770300999]
We study the dynamics of flow in high dimensions for the multi-spiked tensor problem.
Our work builds on our companion paper [Ben A, Gerbelot, Langevin], which determines the sample separation conditions for the SNRs necessary for ensuring exact recovery.
arXiv Detail & Related papers (2024-12-19T08:59:49Z) - Stochastic gradient descent in high dimensions for multi-spiked tensor PCA [8.435118770300999]
We study the dynamics in high dimensions of online gradient descent for the multi-spiked tensor model.
We find that the spikes are recovered sequentially in a process we term "sequential elimination"
arXiv Detail & Related papers (2024-10-23T17:20:41Z) - Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
We consider clipped optimization problems with heavy-tailed noise with structured density.
We show that it is possible to get faster rates of convergence than $mathcalO(K-(alpha - 1)/alpha)$, when the gradients have finite moments of order.
We prove that the resulting estimates have negligible bias and controllable variance.
arXiv Detail & Related papers (2023-11-07T17:39:17Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - Towards Sample-Optimal Compressive Phase Retrieval with Sparse and
Generative Priors [59.33977545294148]
We show that $O(k log L)$ samples suffice to guarantee that the signal is close to any vector that minimizes an amplitude-based empirical loss function.
We adapt this result to sparse phase retrieval, and show that $O(s log n)$ samples are sufficient for a similar guarantee when the underlying signal is $s$-sparse and $n$-dimensional.
arXiv Detail & Related papers (2021-06-29T12:49:54Z) - Lattice partition recovery with dyadic CART [79.96359947166592]
We study piece-wise constant signals corrupted by additive Gaussian noise over a $d$-dimensional lattice.
Data of this form naturally arise in a host of applications, and the tasks of signal detection or testing, de-noising and estimation have been studied extensively in the statistical and signal processing literature.
In this paper we consider instead the problem of partition recovery, i.e.of estimating the partition of the lattice induced by the constancy regions of the unknown signal.
We prove that a DCART-based procedure consistently estimates the underlying partition at a rate of order $sigma2 k*
arXiv Detail & Related papers (2021-05-27T23:41:01Z) - Faster Convergence of Stochastic Gradient Langevin Dynamics for
Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave.
At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z) - Spatio-Temporal Tensor Sketching via Adaptive Sampling [15.576219771198389]
We propose SkeTenSmooth, a novel tensor factorization framework that uses adaptive sampling to compress tensor slices in a temporally streaming fashion.
Experiments on the New York City Yellow Taxi data show that SkeTenSmooth greatly reduces the memory cost and random sampling rate.
arXiv Detail & Related papers (2020-06-21T23:55:10Z) - The Heavy-Tail Phenomenon in SGD [7.366405857677226]
We show that depending on the structure of the Hessian of the loss at the minimum, the SGD iterates will converge to a emphheavy-tailed stationary distribution.
We translate our results into insights about the behavior of SGD in deep learning.
arXiv Detail & Related papers (2020-06-08T16:43:56Z) - Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient
Clipping [69.9674326582747]
We propose a new accelerated first-order method called clipped-SSTM for smooth convex optimization with heavy-tailed distributed noise in gradients.
We prove new complexity that outperform state-of-the-art results in this case.
We derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.
arXiv Detail & Related papers (2020-05-21T17:05:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.