Related papers: What does a deep neural network confidently perceive? The effective dimension of high certainty class manifolds and their low confidence boundaries

What does a deep neural network confidently perceive? The effective dimension of high certainty class manifolds and their low confidence boundaries

URL: http://arxiv.org/abs/2210.05546v1
Date: Tue, 11 Oct 2022 15:42:06 GMT
Title: What does a deep neural network confidently perceive? The effective dimension of high certainty class manifolds and their low confidence boundaries
Authors: Stanislav Fort, Ekin Dogus Cubuk, Surya Ganguli, Samuel S. Schoenholz
Abstract summary: Deep neural network classifiers partition input space into high confidence regions for each class. We exploit the notions of Gaussian width and Gordon's escape theorem to tractably estimate the effective dimension of CMs. We show several connections between the dimension of CMs, generalization, and robustness.
Score: 53.45325448933401
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural network classifiers partition input space into high confidence regions for each class. The geometry of these class manifolds (CMs) is widely studied and intimately related to model performance; for example, the margin depends on CM boundaries. We exploit the notions of Gaussian width and Gordon's escape theorem to tractably estimate the effective dimension of CMs and their boundaries through tomographic intersections with random affine subspaces of varying dimension. We show several connections between the dimension of CMs, generalization, and robustness. In particular we investigate how CM dimension depends on 1) the dataset, 2) architecture (including ResNet, WideResNet \& Vision Transformer), 3) initialization, 4) stage of training, 5) class, 6) network width, 7) ensemble size, 8) label randomization, 9) training set size, and 10) robustness to data corruption. Together a picture emerges that higher performing and more robust models have higher dimensional CMs. Moreover, we offer a new perspective on ensembling via intersections of CMs. Our code is at https://github.com/stanislavfort/slice-dice-optimize/

Related papers

Connecting Parameter Magnitudes and Hessian Eigenspaces at Scale using Sketched Methods [22.835933033524718]
We develop a methodology to measure the similarity between arbitrary parameter masks and Hessian eigenspaces via Grassmannian metrics. Our experiments reveal an *overlap* between magnitude parameter masks and top Hessian eigenspaces consistently higher than chance-level. Our work provides a methodology to approximate and analyze deep learning Hessians at scale, as well as a novel insight on the structure of their eigenspace.
arXiv Detail & Related papers (2025-04-20T18:29:39Z)
Kick Back & Relax++: Scaling Beyond Ground-Truth Depth with SlowTV & CribsTV [50.616892315086574]
This paper proposes two novel datasets: SlowTV and CribsTV. These are large-scale datasets curated from publicly available YouTube videos, containing a total of 2M training frames. We leverage these datasets to tackle the challenging task of zero-shot generalization.
arXiv Detail & Related papers (2024-03-03T17:29:03Z)
Super Consistency of Neural Network Landscapes and Learning Rate Transfer [72.54450821671624]
We study the landscape through the lens of the loss Hessian. We find that certain spectral properties under $mu$P are largely independent of the size of the network. We show that in the Neural Tangent Kernel (NTK) and other scaling regimes, the sharpness exhibits very different dynamics at different scales.
arXiv Detail & Related papers (2024-02-27T12:28:01Z)
Data Representations' Study of Latent Image Manifolds [5.801621787540268]
We find that state-of-the-art trained convolutional neural networks for image classification have a characteristic curvature profile along layers. We also show that the curvature gap between the last two layers has a strong correlation with the generalization capability of the network.
arXiv Detail & Related papers (2023-05-31T10:49:16Z)
Feature-Learning Networks Are Consistent Across Widths At Realistic Scales [72.27228085606147]
We study the effect of width on the dynamics of feature-learning neural networks across a variety of architectures and datasets. Early in training, wide neural networks trained on online data have not only identical loss curves but also agree in their point-wise test predictions throughout training. We observe, however, that ensembles of narrower networks perform worse than a single wide network.
arXiv Detail & Related papers (2023-05-28T17:09:32Z)
Origami in N dimensions: How feed-forward networks manufacture linear separability [1.7404865362620803]
We show that a feed-forward architecture has one primary tool at hand to achieve separability: progressive folding of the data manifold in unoccupied higher dimensions. We argue that an alternative method based on shear, requiring very deep architectures, plays only a small role in real-world networks. Based on the mechanistic insight, we predict that the progressive generation of separability is necessarily accompanied by neurons showing mixed selectivity and bimodal tuning curves.
arXiv Detail & Related papers (2022-03-21T21:33:55Z)
Exploring the Common Principal Subspace of Deep Features in Neural Networks [50.37178960258464]
We find that different Deep Neural Networks (DNNs) trained with the same dataset share a common principal subspace in latent spaces. Specifically, we design a new metric $mathcalP$-vector to represent the principal subspace of deep features learned in a DNN. Small angles (with cosine close to $1.0$) have been found in the comparisons between any two DNNs trained with different algorithms/architectures.
arXiv Detail & Related papers (2021-10-06T15:48:32Z)
Manifold Topology Divergence: a Framework for Comparing Data Manifolds [109.0784952256104]
We develop a framework for comparing data manifold, aimed at the evaluation of deep generative models. Based on the Cross-Barcode, we introduce the Manifold Topology Divergence score (MTop-Divergence) We demonstrate that the MTop-Divergence accurately detects various degrees of mode-dropping, intra-mode collapse, mode invention, and image disturbance.
arXiv Detail & Related papers (2021-06-08T00:30:43Z)
More Diverse Means Better: Multimodal Deep Learning Meets Remote Sensing Imagery Classification [43.35966675372692]
We show how to train deep networks and build the network architecture. In particular, we show different fusion strategies as well as how to train deep networks and build the network architecture. Our framework is not only limited to pixel-wise classification tasks but also applicable to spatial information modeling with convolutional neural networks (CNNs)
arXiv Detail & Related papers (2020-08-12T17:45:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.