Hierarchical nucleation in deep neural networks
- URL: http://arxiv.org/abs/2007.03506v2
- Date: Thu, 9 Jul 2020 15:14:19 GMT
- Title: Hierarchical nucleation in deep neural networks
- Authors: Diego Doimo, Aldo Glielmo, Alessio Ansuini, Alessandro Laio
- Abstract summary: We study the evolution of the probability density of the ImageNet dataset across the hidden layers in some state-of-the-art DCNs.
We find that the initial layers generate a unimodal probability density getting rid of any structure irrelevant for classification.
In subsequent layers density peaks arise in a hierarchical fashion that mirrors the semantic hierarchy of the concepts.
- Score: 67.85373725288136
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep convolutional networks (DCNs) learn meaningful representations where
data that share the same abstract characteristics are positioned closer and
closer. Understanding these representations and how they are generated is of
unquestioned practical and theoretical interest. In this work we study the
evolution of the probability density of the ImageNet dataset across the hidden
layers in some state-of-the-art DCNs. We find that the initial layers generate
a unimodal probability density getting rid of any structure irrelevant for
classification. In subsequent layers density peaks arise in a hierarchical
fashion that mirrors the semantic hierarchy of the concepts. Density peaks
corresponding to single categories appear only close to the output and via a
very sharp transition which resembles the nucleation process of a heterogeneous
liquid. This process leaves a footprint in the probability density of the
output layer where the topography of the peaks allows reconstructing the
semantic relationships of the categories.
Related papers
- Towards the Uncharted: Density-Descending Feature Perturbation for Semi-supervised Semantic Segmentation [51.66997548477913]
We propose a novel feature-level consistency learning framework named Density-Descending Feature Perturbation (DDFP)
Inspired by the low-density separation assumption in semi-supervised learning, our key insight is that feature density can shed a light on the most promising direction for the segmentation classifier to explore.
The proposed DDFP outperforms other designs on feature-level perturbations and shows state of the art performances on both Pascal VOC and Cityscapes dataset.
arXiv Detail & Related papers (2024-03-11T06:59:05Z) - A Phase Transition in Diffusion Models Reveals the Hierarchical Nature
of Data [55.748186000425996]
Recent advancements show that diffusion models can generate high-quality images.
We study this phenomenon in a hierarchical generative model of data.
Our analysis characterises the relationship between time and scale in diffusion models.
arXiv Detail & Related papers (2024-02-26T19:52:33Z) - Data Representations' Study of Latent Image Manifolds [5.801621787540268]
We find that state-of-the-art trained convolutional neural networks for image classification have a characteristic curvature profile along layers.
We also show that the curvature gap between the last two layers has a strong correlation with the generalization capability of the network.
arXiv Detail & Related papers (2023-05-31T10:49:16Z) - Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z) - A new perspective on probabilistic image modeling [92.89846887298852]
We present a new probabilistic approach for image modeling capable of density estimation, sampling and tractable inference.
DCGMMs can be trained end-to-end by SGD from random initial conditions, much like CNNs.
We show that DCGMMs compare favorably to several recent PC and SPN models in terms of inference, classification and sampling.
arXiv Detail & Related papers (2022-03-21T14:53:57Z) - Understanding the Distributions of Aggregation Layers in Deep Neural
Networks [8.784438985280092]
aggregation functions as an important mechanism for consolidating deep features into a more compact representation.
In particular, the proximity of global aggregation layers to the output layers of DNNs mean that aggregated features have a direct influence on the performance of a deep net.
We propose a novel mathematical formulation for analytically modelling the probability distributions of output values of layers involved with deep feature aggregation.
arXiv Detail & Related papers (2021-07-09T14:23:57Z) - Diffusion Mechanism in Residual Neural Network: Theory and Applications [12.573746641284849]
In many learning tasks with limited training samples, the diffusion connects the labeled and unlabeled data points.
We propose a novel diffusion residual network (Diff-ResNet) internally introduces diffusion into the architectures of neural networks.
Under the structured data assumption, it is proved that the proposed diffusion block can increase the distance-diameter ratio that improves the separability of inter-class points.
arXiv Detail & Related papers (2021-05-07T10:42:59Z) - Hierarchical Graph Capsule Network [78.4325268572233]
We propose hierarchical graph capsule network (HGCN) that can jointly learn node embeddings and extract graph hierarchies.
To learn the hierarchical representation, HGCN characterizes the part-whole relationship between lower-level capsules (part) and higher-level capsules (whole)
arXiv Detail & Related papers (2020-12-16T04:13:26Z) - Kernelized dense layers for facial expression recognition [10.98068123467568]
We propose a Kernelized Dense Layer (KDL) which captures higher order feature interactions instead of conventional linear relations.
We show that our model achieves competitive results with respect to the state-of-the-art approaches.
arXiv Detail & Related papers (2020-09-22T21:02:00Z) - Layer-stacked Attention for Heterogeneous Network Embedding [0.0]
Layer-stacked ATTention Embedding (LATTE) is an architecture that automatically decomposes higher-order meta relations at each layer.
LATTE offers a more interpretable aggregation scheme for nodes of different types at different neighborhood ranges.
In both transductive and inductive node classification tasks, LATTE can achieve state-of-the-art performance compared to existing approaches.
arXiv Detail & Related papers (2020-09-17T05:13:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.