How Deep Neural Networks Learn Compositional Data: The Random Hierarchy Model
- URL: http://arxiv.org/abs/2307.02129v5
- Date: Wed, 3 Jul 2024 07:57:00 GMT
- Title: How Deep Neural Networks Learn Compositional Data: The Random Hierarchy Model
- Authors: Francesco Cagnetta, Leonardo Petrini, Umberto M. Tomasini, Alessandro Favero, Matthieu Wyart,
- Abstract summary: We introduce the Random Hierarchy Model: a family of synthetic tasks inspired by the hierarchical structure of language and images.
We find that deep networks learn the task by developing internal representations invariant to exchanging equivalent groups.
Our results indicate how deep networks overcome the curse of dimensionality by building invariant representations.
- Score: 47.617093812158366
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning algorithms demonstrate a surprising ability to learn high-dimensional tasks from limited examples. This is commonly attributed to the depth of neural networks, enabling them to build a hierarchy of abstract, low-dimensional data representations. However, how many training examples are required to learn such representations remains unknown. To quantitatively study this question, we introduce the Random Hierarchy Model: a family of synthetic tasks inspired by the hierarchical structure of language and images. The model is a classification task where each class corresponds to a group of high-level features, chosen among several equivalent groups associated with the same class. In turn, each feature corresponds to a group of sub-features chosen among several equivalent ones and so on, following a hierarchy of composition rules. We find that deep networks learn the task by developing internal representations invariant to exchanging equivalent groups. Moreover, the number of data required corresponds to the point where correlations between low-level features and classes become detectable. Overall, our results indicate how deep networks overcome the curse of dimensionality by building invariant representations, and provide an estimate of the number of data required to learn a hierarchical task.
Related papers
- How Deep Networks Learn Sparse and Hierarchical Data: the Sparse Random Hierarchy Model [4.215221129670858]
We show that by introducing sparsity to generative hierarchical models of data, the task acquires insensitivity to spatial transformations that are discrete versions of smooth transformations.
We quantify how the sample complexity of CNNs learning the SRHM depends on both the sparsity and hierarchical structure of the task.
arXiv Detail & Related papers (2024-04-16T17:01:27Z) - Neural Sculpting: Uncovering hierarchically modular task structure in
neural networks through pruning and network analysis [8.080026425139708]
We show that hierarchically modular neural networks offer benefits such as learning efficiency, generalization, multi-task learning, and transfer.
We propose an approach based on iterative unit and edge pruning (during training), combined with network analysis for module detection and hierarchy inference.
arXiv Detail & Related papers (2023-05-28T15:12:32Z) - The Group Loss++: A deeper look into group loss for deep metric learning [65.19665861268574]
Group Loss is a loss function based on a differentiable label-propagation method that enforces embedding similarity across all samples of a group.
We show state-of-the-art results on clustering and image retrieval on four datasets, and present competitive results on two person re-identification datasets.
arXiv Detail & Related papers (2022-04-04T14:09:58Z) - Computing Class Hierarchies from Classifiers [12.631679928202516]
We propose a novel algorithm for automatically acquiring a class hierarchy from a neural network.
Our algorithm produces surprisingly good hierarchies for some well-known deep neural network models.
arXiv Detail & Related papers (2021-12-02T13:01:04Z) - Learning Debiased and Disentangled Representations for Semantic
Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation.
By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes.
Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z) - Fuzzy Simplicial Networks: A Topology-Inspired Model to Improve Task
Generalization in Few-shot Learning [1.0062040918634414]
Few-shot learning algorithms are designed to generalize well to new tasks with limited data.
We introduce a new few-shot model called Fuzzy Simplicial Networks (FSN) which leverages a construction from topology to more flexibly represent each class from limited data.
arXiv Detail & Related papers (2020-09-23T17:01:09Z) - Understanding the Role of Individual Units in a Deep Neural Network [85.23117441162772]
We present an analytic framework to systematically identify hidden units within image classification and image generation networks.
First, we analyze a convolutional neural network (CNN) trained on scene classification and discover units that match a diverse set of object concepts.
Second, we use a similar analytic method to analyze a generative adversarial network (GAN) model trained to generate scenes.
arXiv Detail & Related papers (2020-09-10T17:59:10Z) - Adaptive Hierarchical Decomposition of Large Deep Networks [4.272649614101117]
As datasets get larger, a natural question is if existing deep learning architectures can be extended to handle the 50+K classes thought to be perceptible by a typical human.
This paper introduces a framework that automatically analyzes and configures a family of smaller deep networks as a replacement to a singular, larger network.
The resulting smaller networks are highly scalable, parallel and more practical to train, and achieve higher classification accuracy.
arXiv Detail & Related papers (2020-07-17T21:04:50Z) - Neural networks adapting to datasets: learning network size and topology [77.34726150561087]
We introduce a flexible setup allowing for a neural network to learn both its size and topology during the course of a gradient-based training.
The resulting network has the structure of a graph tailored to the particular learning task and dataset.
arXiv Detail & Related papers (2020-06-22T12:46:44Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.