Relative stability toward diffeomorphisms in deep nets indicates
performance
- URL: http://arxiv.org/abs/2105.02468v1
- Date: Thu, 6 May 2021 07:03:30 GMT
- Title: Relative stability toward diffeomorphisms in deep nets indicates
performance
- Authors: Leonardo Petrini, Alessandro Favero, Mario Geiger, Matthieu Wyart
- Abstract summary: We show that stability toward diffeomorphisms does not strongly correlate to performance on benchmark data sets of images.
We find that the stability toward diffeomorphisms relative to that of generic transformations $R_f$ correlates remarkably with the test error $epsilon_t$.
For CIFAR10 and 15 known architectures, we find $epsilon_tapprox 0.2sqrtR_f$, suggesting that obtaining a small $R_f$ is important to achieve good performance.
- Score: 66.51503682738931
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding why deep nets can classify data in large dimensions remains a
challenge. It has been proposed that they do so by becoming stable to
diffeomorphisms, yet existing empirical measurements support that it is often
not the case. We revisit this question by defining a maximum-entropy
distribution on diffeomorphisms, that allows to study typical diffeomorphisms
of a given norm. We confirm that stability toward diffeomorphisms does not
strongly correlate to performance on four benchmark data sets of images. By
contrast, we find that the stability toward diffeomorphisms relative to that of
generic transformations $R_f$ correlates remarkably with the test error
$\epsilon_t$. It is of order unity at initialization but decreases by several
decades during training for state-of-the-art architectures. For CIFAR10 and 15
known architectures, we find $\epsilon_t\approx 0.2\sqrt{R_f}$, suggesting that
obtaining a small $R_f$ is important to achieve good performance. We study how
$R_f$ depends on the size of the training set and compare it to a simple model
of invariant learning.
Related papers
- Isomorphic Pruning for Vision Models [56.286064975443026]
Structured pruning reduces the computational overhead of deep neural networks by removing redundant sub-structures.
We present Isomorphic Pruning, a simple approach that demonstrates effectiveness across a range of network architectures.
arXiv Detail & Related papers (2024-07-05T16:14:53Z) - Deep Invertible Approximation of Topologically Rich Maps between
Manifolds [17.60434807901964]
We show how to design neural networks that allow for stable universal approximation of maps between topologically interesting manifold.
By exploiting the topological parallels between locally bilipschitz maps, covering spaces, and local homeomorphisms, we find that a novel network of the form $mathcalT circ p circ mathcalE$ is a universal approximator of local diffeomorphisms.
We also outline possible extensions of our architecture to address molecular imaging of molecules with symmetries.
arXiv Detail & Related papers (2022-10-02T17:14:43Z) - Don't Explain Noise: Robust Counterfactuals for Randomized Ensembles [50.81061839052459]
We formalize the generation of robust counterfactual explanations as a probabilistic problem.
We show the link between the robustness of ensemble models and the robustness of base learners.
Our method achieves high robustness with only a small increase in the distance from counterfactual explanations to their initial observations.
arXiv Detail & Related papers (2022-05-27T17:28:54Z) - Measuring dissimilarity with diffeomorphism invariance [94.02751799024684]
We introduce DID, a pairwise dissimilarity measure applicable to a wide range of data spaces.
We prove that DID enjoys properties which make it relevant for theoretical study and practical use.
arXiv Detail & Related papers (2022-02-11T13:51:30Z) - Equivariant Discrete Normalizing Flows [10.867162810786361]
We focus on building equivariant normalizing flows using discrete layers.
We introduce two new equivariant flows: $G$-coupling Flows and $G$-Residual Flows.
Our construction of $G$-Residual Flows are also universal, in the sense that we prove an $G$-equivariant diffeomorphism can be exactly mapped by a $G$-residual flow.
arXiv Detail & Related papers (2021-10-16T20:16:00Z) - Learning with invariances in random features and kernel models [19.78800773518545]
We introduce two classes of models: invariant random features and invariant kernel methods.
We characterize the test error of invariant methods in a high-dimensional regime in which the sample size and number of hidden units scale as estimators in the dimension.
We show that exploiting invariance in the architecture saves a $dalpha$ factor ($d$ stands for the dimension) in sample size and number of hidden units to achieve the same test error as for unstructured architectures.
arXiv Detail & Related papers (2021-02-25T23:06:21Z) - A simple geometric proof for the benefit of depth in ReLU networks [57.815699322370826]
We present a simple proof for the benefit of depth in multi-layer feedforward network with rectified activation ("depth separation")
We present a concrete neural network with linear depth (in $m$) and small constant width ($leq 4$) that classifies the problem with zero error.
arXiv Detail & Related papers (2021-01-18T15:40:27Z) - Learning Bijective Feature Maps for Linear ICA [73.85904548374575]
We show that existing probabilistic deep generative models (DGMs) which are tailor-made for image data, underperform on non-linear ICA tasks.
To address this, we propose a DGM which combines bijective feature maps with a linear ICA model to learn interpretable latent structures for high-dimensional data.
We create models that converge quickly, are easy to train, and achieve better unsupervised latent factor discovery than flow-based models, linear ICA, and Variational Autoencoders on images.
arXiv Detail & Related papers (2020-02-18T17:58:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.