Data Representations' Study of Latent Image Manifolds
- URL: http://arxiv.org/abs/2305.19730v2
- Date: Thu, 16 Nov 2023 10:41:03 GMT
- Title: Data Representations' Study of Latent Image Manifolds
- Authors: Ilya Kaufman and Omri Azencot
- Abstract summary: We find that state-of-the-art trained convolutional neural networks for image classification have a characteristic curvature profile along layers.
We also show that the curvature gap between the last two layers has a strong correlation with the generalization capability of the network.
- Score: 5.801621787540268
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks have been demonstrated to achieve phenomenal success in
many domains, and yet their inner mechanisms are not well understood. In this
paper, we investigate the curvature of image manifolds, i.e., the manifold
deviation from being flat in its principal directions. We find that
state-of-the-art trained convolutional neural networks for image classification
have a characteristic curvature profile along layers: an initial steep
increase, followed by a long phase of a plateau, and followed by another
increase. In contrast, this behavior does not appear in untrained networks in
which the curvature flattens. We also show that the curvature gap between the
last two layers has a strong correlation with the generalization capability of
the network. Moreover, we find that the intrinsic dimension of latent codes is
not necessarily indicative of curvature. Finally, we observe that common
regularization methods such as mixup yield flatter representations when
compared to other methods. Our experiments show consistent results over a
variety of deep learning architectures and multiple data sets. Our code is
publicly available at https://github.com/azencot-group/CRLM
Related papers
- Asymptotics of Learning with Deep Structured (Random) Features [9.366617422860543]
For a large class of feature maps we provide a tight characterisation of the test error associated with learning the readout layer.
In some cases our results can capture feature maps learned by deep, finite-width neural networks trained under gradient descent.
arXiv Detail & Related papers (2024-02-21T18:35:27Z) - Understanding Deep Representation Learning via Layerwise Feature
Compression and Discrimination [33.273226655730326]
We show that each layer of a deep linear network progressively compresses within-class features at a geometric rate and discriminates between-class features at a linear rate.
This is the first quantitative characterization of feature evolution in hierarchical representations of deep linear networks.
arXiv Detail & Related papers (2023-11-06T09:00:38Z) - Effects of Data Geometry in Early Deep Learning [16.967930721746672]
Deep neural networks can approximate functions on different types of data, from images to graphs, with varied underlying structure.
We study how a randomly neural network with piece-wise linear activation splits the data manifold into regions where the neural network behaves as a linear function.
arXiv Detail & Related papers (2022-12-29T17:32:05Z) - Convolutional Neural Networks on Manifolds: From Graphs and Back [122.06927400759021]
We propose a manifold neural network (MNN) composed of a bank of manifold convolutional filters and point-wise nonlinearities.
To sum up, we focus on the manifold model as the limit of large graphs and construct MNNs, while we can still bring back graph neural networks by the discretization of MNNs.
arXiv Detail & Related papers (2022-10-01T21:17:39Z) - Training invariances and the low-rank phenomenon: beyond linear networks [44.02161831977037]
We show that when one trains a deep linear network with logistic or exponential loss on linearly separable data, the weights converge to rank-$1$ matrices.
This is the first time a low-rank phenomenon is proven rigorously for nonlinear ReLU-activated feedforward networks.
Our proof relies on a specific decomposition of the network into a multilinear function and another ReLU network whose weights are constant under a certain parameter directional convergence.
arXiv Detail & Related papers (2022-01-28T07:31:19Z) - Optimization-Based Separations for Neural Networks [57.875347246373956]
We show that gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations.
This is the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice.
arXiv Detail & Related papers (2021-12-04T18:07:47Z) - Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks.
We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z) - Spatio-Temporal Inception Graph Convolutional Networks for
Skeleton-Based Action Recognition [126.51241919472356]
We design a simple and highly modularized graph convolutional network architecture for skeleton-based action recognition.
Our network is constructed by repeating a building block that aggregates multi-granularity information from both the spatial and temporal paths.
arXiv Detail & Related papers (2020-11-26T14:43:04Z) - Hierarchical nucleation in deep neural networks [67.85373725288136]
We study the evolution of the probability density of the ImageNet dataset across the hidden layers in some state-of-the-art DCNs.
We find that the initial layers generate a unimodal probability density getting rid of any structure irrelevant for classification.
In subsequent layers density peaks arise in a hierarchical fashion that mirrors the semantic hierarchy of the concepts.
arXiv Detail & Related papers (2020-07-07T14:42:18Z) - Embedding Propagation: Smoother Manifold for Few-Shot Classification [131.81692677836202]
We propose to use embedding propagation as an unsupervised non-parametric regularizer for manifold smoothing in few-shot classification.
We empirically show that embedding propagation yields a smoother embedding manifold.
We show that embedding propagation consistently improves the accuracy of the models in multiple semi-supervised learning scenarios by up to 16% points.
arXiv Detail & Related papers (2020-03-09T13:51:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.