Deep Learning of Compositional Targets with Hierarchical Spectral Methods
- URL: http://arxiv.org/abs/2602.10867v1
- Date: Wed, 11 Feb 2026 13:54:20 GMT
- Title: Deep Learning of Compositional Targets with Hierarchical Spectral Methods
- Authors: Hugo Tabanelli, Yatin Dandi, Luca Pesce, Florent Krzakala,
- Abstract summary: Why depth yields a genuine computational advantage over shallow methods remains a central open question in learning theory.<n>We study this question in a controlled high-dimensional setting, focusing on compositional target functions.<n>We analyze their learnability using an explicit three-layer fitting model trained via layer-wise spectral estimators.
- Score: 19.741463287401697
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Why depth yields a genuine computational advantage over shallow methods remains a central open question in learning theory. We study this question in a controlled high-dimensional Gaussian setting, focusing on compositional target functions. We analyze their learnability using an explicit three-layer fitting model trained via layer-wise spectral estimators. Although the target is globally a high-degree polynomial, its compositional structure allows learning to proceed in stages: an intermediate representation reveals structure that is inaccessible at the input level. This reduces learning to simpler spectral estimation problems, well studied in the context of multi-index models, whereas any shallow estimator must resolve all components simultaneously. Our analysis relies on Gaussian universality, leading to sharp separations in sample complexity between two and three-layer learning strategies.
Related papers
- Provable Learning of Random Hierarchy Models and Hierarchical Shallow-to-Deep Chaining [58.69016084278948]
We consider a hierarchical context-free grammar introduced by arXiv:2307.02129 and conjectured to separate deep and shallow networks.<n>We prove that, under mild conditions, a deep convolutional network can be efficiently trained to learn this function class.
arXiv Detail & Related papers (2026-01-27T16:19:54Z) - How do Transformers Learn Implicit Reasoning? [67.02072851088637]
We study how implicit multi-hop reasoning emerges by training transformers from scratch in a controlled symbolic environment.<n>We find that training with atomic triples is not necessary but accelerates learning, and that second-hop generalization relies on query-level exposure to specific compositional structures.
arXiv Detail & Related papers (2025-05-29T17:02:49Z) - InDeed: Interpretable image deep decomposition with guaranteed generalizability [28.595151003310452]
Image decomposition aims to analyze an image into elementary components.<n>Deep learning can be powerful for such tasks, but its combination with a focus on interpretability and generalizability is rarely explored.<n>We introduce a novel framework for interpretable deep image decomposition, combining hierarchical Bayesian modeling and deep learning.
arXiv Detail & Related papers (2025-01-02T07:58:26Z) - Fundamental computational limits of weak learnability in high-dimensional multi-index models [30.501140910531017]
This paper focuses on the minimum sample complexity required for weakly recovering their low-dimensional structure with first-order iterative algorithms.<n>Our findings unfold in three parts: (i) we identify under which conditions a trivial subspace can be learned with a single step of a first-order algorithm for any $alpha!>!0$; (ii) if the trivial subspace is empty, we provide necessary and sufficient conditions for the existence of an easy subspace.<n>In a limited but interesting set of really hard directions -- akin to the parity problem -- $alpha_c$ is found
arXiv Detail & Related papers (2024-05-24T11:59:02Z) - Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks [44.31729147722701]
We show that three-layer neural networks have provably richer feature learning capabilities than two-layer networks.<n>This work makes progress towards understanding the provable benefit of three-layer neural networks over two-layer networks in the feature learning regime.
arXiv Detail & Related papers (2023-05-11T17:19:30Z) - On Interpretable Approaches to Cluster, Classify and Represent
Multi-Subspace Data via Minimum Lossy Coding Length based on Rate-Distortion
Theory [0.0]
Clustering, classify and represent are three fundamental objectives of learning from high-dimensional data with intrinsic structure.
This paper introduces three interpretable approaches, i.e., segmentation (clustering) via the Minimum Lossy Coding Length criterion, classification via the Minimum Incremental Coding Length criterion and representation via the Maximal Coding Rate Reduction criterion.
arXiv Detail & Related papers (2023-02-21T01:15:08Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - Towards Understanding Mixture of Experts in Deep Learning [95.27215939891511]
We study how the MoE layer improves the performance of neural network learning.
Our results suggest that the cluster structure of the underlying problem and the non-linearity of the expert are pivotal to the success of MoE.
arXiv Detail & Related papers (2022-08-04T17:59:10Z) - Spectral Analysis Network for Deep Representation Learning and Image
Clustering [53.415803942270685]
This paper proposes a new network structure for unsupervised deep representation learning based on spectral analysis.
It can identify the local similarities among images in patch level and thus more robust against occlusion.
It can learn more clustering-friendly representations and is capable to reveal the deep correlations among data samples.
arXiv Detail & Related papers (2020-09-11T05:07:15Z) - Understanding Deep Architectures with Reasoning Layer [60.90906477693774]
We show that properties of the algorithm layers, such as convergence, stability, and sensitivity, are intimately related to the approximation and generalization abilities of the end-to-end model.
Our theory can provide useful guidelines for designing deep architectures with reasoning layers.
arXiv Detail & Related papers (2020-06-24T00:26:35Z) - Towards Better Generalization: Joint Depth-Pose Learning without PoseNet [36.414471128890284]
We tackle the essential problem of scale inconsistency for self-supervised joint depth-pose learning.
Most existing methods assume that a consistent scale of depth and pose can be learned across all input samples.
We propose a novel system that explicitly disentangles scale from the network estimation.
arXiv Detail & Related papers (2020-04-03T00:28:09Z) - Convolutional Spectral Kernel Learning [21.595130250234646]
We build an interpretable convolutional spectral kernel network (textttCSKN) based on the inverse Fourier transform.
We derive the generalization error bounds and introduce two regularizers to improve the performance.
Experiments results on real-world datasets validate the effectiveness of the learning framework.
arXiv Detail & Related papers (2020-02-28T14:35:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.