Provable Learning of Random Hierarchy Models and Hierarchical Shallow-to-Deep Chaining
- URL: http://arxiv.org/abs/2601.19756v1
- Date: Tue, 27 Jan 2026 16:19:54 GMT
- Title: Provable Learning of Random Hierarchy Models and Hierarchical Shallow-to-Deep Chaining
- Authors: Yunwei Ren, Yatin Dandi, Florent Krzakala, Jason D. Lee,
- Abstract summary: We consider a hierarchical context-free grammar introduced by arXiv:2307.02129 and conjectured to separate deep and shallow networks.<n>We prove that, under mild conditions, a deep convolutional network can be efficiently trained to learn this function class.
- Score: 58.69016084278948
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The empirical success of deep learning is often attributed to deep networks' ability to exploit hierarchical structure in data, constructing increasingly complex features across layers. Yet despite substantial progress in deep learning theory, most optimization results sill focus on networks with only two or three layers, leaving the theoretical understanding of hierarchical learning in genuinely deep models limited. This leads to a natural question: can we prove that deep networks, trained by gradient-based methods, can efficiently exploit hierarchical structure? In this work, we consider Random Hierarchy Models -- a hierarchical context-free grammar introduced by arXiv:2307.02129 and conjectured to separate deep and shallow networks. We prove that, under mild conditions, a deep convolutional network can be efficiently trained to learn this function class. Our proof builds on a general observation: if intermediate layers can receive clean signal from the labels and the relevant features are weakly identifiable, then layerwise training each individual layer suffices to hierarchically learn the target function.
Related papers
- Understanding Deep Representation Learning via Layerwise Feature Compression and Discrimination [33.273226655730326]
We show that each layer of a deep linear network progressively compresses within-class features at a geometric rate and discriminates between-class features at a linear rate.<n>This is the first quantitative characterization of feature evolution in hierarchical representations of deep linear networks.
arXiv Detail & Related papers (2023-11-06T09:00:38Z) - How Deep Neural Networks Learn Compositional Data: The Random Hierarchy Model [47.617093812158366]
We introduce the Random Hierarchy Model: a family of synthetic tasks inspired by the hierarchical structure of language and images.
We find that deep networks learn the task by developing internal representations invariant to exchanging equivalent groups.
Our results indicate how deep networks overcome the curse of dimensionality by building invariant representations.
arXiv Detail & Related papers (2023-07-05T09:11:09Z) - Neural Sculpting: Uncovering hierarchically modular task structure in
neural networks through pruning and network analysis [8.080026425139708]
We show that hierarchically modular neural networks offer benefits such as learning efficiency, generalization, multi-task learning, and transfer.
We propose an approach based on iterative unit and edge pruning (during training), combined with network analysis for module detection and hierarchy inference.
arXiv Detail & Related papers (2023-05-28T15:12:32Z) - Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks [44.31729147722701]
We show that three-layer neural networks have provably richer feature learning capabilities than two-layer networks.<n>This work makes progress towards understanding the provable benefit of three-layer neural networks over two-layer networks in the feature learning regime.
arXiv Detail & Related papers (2023-05-11T17:19:30Z) - Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z) - The staircase property: How hierarchical structure can guide deep
learning [38.713566366330326]
This paper identifies a structural property of data distributions that enables deep neural networks to learn hierarchically.
We prove that functions satisfying this property can be learned in time using layerwise coordinate descent on regular neural networks.
arXiv Detail & Related papers (2021-08-24T08:19:05Z) - Learning Structures for Deep Neural Networks [99.8331363309895]
We propose to adopt the efficient coding principle, rooted in information theory and developed in computational neuroscience.
We show that sparse coding can effectively maximize the entropy of the output signals.
Our experiments on a public image classification dataset demonstrate that using the structure learned from scratch by our proposed algorithm, one can achieve a classification accuracy comparable to the best expert-designed structure.
arXiv Detail & Related papers (2021-05-27T12:27:24Z) - Firefly Neural Architecture Descent: a General Approach for Growing
Neural Networks [50.684661759340145]
Firefly neural architecture descent is a general framework for progressively and dynamically growing neural networks.
We show that firefly descent can flexibly grow networks both wider and deeper, and can be applied to learn accurate but resource-efficient neural architectures.
In particular, it learns networks that are smaller in size but have higher average accuracy than those learned by the state-of-the-art methods.
arXiv Detail & Related papers (2021-02-17T04:47:18Z) - Dual-constrained Deep Semi-Supervised Coupled Factorization Network with
Enriched Prior [80.5637175255349]
We propose a new enriched prior based Dual-constrained Deep Semi-Supervised Coupled Factorization Network, called DS2CF-Net.
To ex-tract hidden deep features, DS2CF-Net is modeled as a deep-structure and geometrical structure-constrained neural network.
Our network can obtain state-of-the-art performance for representation learning and clustering.
arXiv Detail & Related papers (2020-09-08T13:10:21Z) - Gradient-based Competitive Learning: Theory [1.6752712949948443]
This paper introduces a novel perspective in this area by combining gradient-based and competitive learning.
The theory is based on the intuition that neural networks are able to learn topological structures by working directly on the transpose of the input matrix.
The proposed approach has a great potential as it can be generalized to a vast selection of topological learning tasks.
arXiv Detail & Related papers (2020-09-06T19:00:51Z) - Understanding Deep Architectures with Reasoning Layer [60.90906477693774]
We show that properties of the algorithm layers, such as convergence, stability, and sensitivity, are intimately related to the approximation and generalization abilities of the end-to-end model.
Our theory can provide useful guidelines for designing deep architectures with reasoning layers.
arXiv Detail & Related papers (2020-06-24T00:26:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.