Fundamental computational limits of weak learnability in high-dimensional multi-index models
- URL: http://arxiv.org/abs/2405.15480v3
- Date: Thu, 24 Oct 2024 11:58:59 GMT
- Title: Fundamental computational limits of weak learnability in high-dimensional multi-index models
- Authors: Emanuele Troiani, Yatin Dandi, Leonardo Defilippis, Lenka Zdeborová, Bruno Loureiro, Florent Krzakala,
- Abstract summary: This paper focuses on the minimum sample complexity required for weakly recovering their low-dimensional structure with first-order iterative algorithms.
Our findings unfold in three parts: (i) we identify under which conditions a trivial subspace can be learned with a single step of a first-order algorithm for any $alpha!>!0$; (ii) if the trivial subspace is empty, we provide necessary and sufficient conditions for the existence of an easy subspace.
In a limited but interesting set of really hard directions -- akin to the parity problem -- $alpha_c$ is found
- Score: 30.501140910531017
- License:
- Abstract: Multi-index models - functions which only depend on the covariates through a non-linear transformation of their projection on a subspace - are a useful benchmark for investigating feature learning with neural nets. This paper examines the theoretical boundaries of efficient learnability in this hypothesis class, focusing on the minimum sample complexity required for weakly recovering their low-dimensional structure with first-order iterative algorithms, in the high-dimensional regime where the number of samples $n\!=\!\alpha d$ is proportional to the covariate dimension $d$. Our findings unfold in three parts: (i) we identify under which conditions a trivial subspace can be learned with a single step of a first-order algorithm for any $\alpha\!>\!0$; (ii) if the trivial subspace is empty, we provide necessary and sufficient conditions for the existence of an easy subspace where directions that can be learned only above a certain sample complexity $\alpha\!>\!\alpha_c$, where $\alpha_{c}$ marks a computational phase transition. In a limited but interesting set of really hard directions -- akin to the parity problem -- $\alpha_c$ is found to diverge. Finally, (iii) we show that interactions between different directions can result in an intricate hierarchical learning phenomenon, where directions can be learned sequentially when coupled to easier ones. We discuss in detail the grand staircase picture associated to these functions (and contrast it with the original staircase one). Our theory builds on the optimality of approximate message-passing among first-order iterative methods, delineating the fundamental learnability limit across a broad spectrum of algorithms, including neural networks trained with gradient descent, which we discuss in this context.
Related papers
- Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics [21.55547541297847]
We study the problem of learning multi-index models in high-dimensions using a two-layer neural network trained with the mean-field Langevin algorithm.
Under mild distributional assumptions, we characterize the effective dimension $d_mathrmeff$ that controls both sample and computational complexity.
arXiv Detail & Related papers (2024-08-14T02:13:35Z) - Smoothed Analysis for Learning Concepts with Low Intrinsic Dimension [17.485243410774814]
In traditional models of supervised learning, the goal of a learner is to output a hypothesis that is competitive (to within $epsilon$) of the best fitting concept from some class.
We introduce a smoothed-analysis framework that requires a learner to compete only with the best agnostic.
We obtain the first algorithm forally learning intersections of $k$-halfspaces in time.
arXiv Detail & Related papers (2024-07-01T04:58:36Z) - Stochastic Nested Compositional Bi-level Optimization for Robust Feature
Learning [11.236838268731804]
We develop and analyze algorithms for solving nested bi-level optimization problems.
Our proposed algorithm does not rely on matrix complexity or mini-batches.
arXiv Detail & Related papers (2023-07-11T15:52:04Z) - Nonsmooth automatic differentiation: a cheap gradient principle and
other complexity results [0.0]
We provide a model to estimate the computational costs of the backward and forward modes of algorithmic differentiation for a wide class of nonsmooth programs.
Prominent examples are the famous relu and convolutional neural networks together with their standard loss functions.
arXiv Detail & Related papers (2022-06-01T08:43:35Z) - Dist2Cycle: A Simplicial Neural Network for Homology Localization [66.15805004725809]
Simplicial complexes can be viewed as high dimensional generalizations of graphs that explicitly encode multi-way ordered relations.
We propose a graph convolutional model for learning functions parametrized by the $k$-homological features of simplicial complexes.
arXiv Detail & Related papers (2021-10-28T14:59:41Z) - Deep Magnification-Flexible Upsampling over 3D Point Clouds [103.09504572409449]
We propose a novel end-to-end learning-based framework to generate dense point clouds.
We first formulate the problem explicitly, which boils down to determining the weights and high-order approximation errors.
Then, we design a lightweight neural network to adaptively learn unified and sorted weights as well as the high-order refinements.
arXiv Detail & Related papers (2020-11-25T14:00:18Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Deep neural networks for inverse problems with pseudodifferential
operators: an application to limited-angle tomography [0.4110409960377149]
We propose a novel convolutional neural network (CNN) designed for learning pseudodifferential operators ($Psi$DOs) in the context of linear inverse problems.
We show that, under rather general assumptions on the forward operator, the unfolded iterations of ISTA can be interpreted as the successive layers of a CNN.
In particular, we prove that, in the case of LA-CT, the operations of upscaling, downscaling and convolution, can be exactly determined by combining the convolutional nature of the limited angle X-ray transform and basic properties defining a wavelet system.
arXiv Detail & Related papers (2020-06-02T14:03:41Z) - Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonsmooth functions.
In particular, we study Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions.
arXiv Detail & Related papers (2020-02-10T23:23:04Z) - Backward Feature Correction: How Deep Learning Performs Deep
(Hierarchical) Learning [66.05472746340142]
This paper analyzes how multi-layer neural networks can perform hierarchical learning _efficiently_ and _automatically_ by SGD on the training objective.
We establish a new principle called "backward feature correction", where the errors in the lower-level features can be automatically corrected when training together with the higher-level layers.
arXiv Detail & Related papers (2020-01-13T17:28:29Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.