Separation and Concentration in Deep Networks
- URL: http://arxiv.org/abs/2012.10424v2
- Date: Mon, 15 Mar 2021 15:06:51 GMT
- Title: Separation and Concentration in Deep Networks
- Authors: John Zarka, Florentin Guth, St\'ephane Mallat
- Abstract summary: Deep neural network classifiers progressively separate class distributions around their mean.
For image classification, we show that separation of class means can be achieved with rectified wavelet tight frames that are not learned.
The resulting scattering network reaches the classification accuracy of ResNet-18 on CIFAR-10 and ImageNet, with fewer layers and no learned biases.
- Score: 1.8620637029128544
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Numerical experiments demonstrate that deep neural network classifiers
progressively separate class distributions around their mean, achieving linear
separability on the training set, and increasing the Fisher discriminant ratio.
We explain this mechanism with two types of operators. We prove that a
rectifier without biases applied to sign-invariant tight frames can separate
class means and increase Fisher ratios. On the opposite, a soft-thresholding on
tight frames can reduce within-class variabilities while preserving class
means. Variance reduction bounds are proved for Gaussian mixture models. For
image classification, we show that separation of class means can be achieved
with rectified wavelet tight frames that are not learned. It defines a
scattering transform. Learning $1 \times 1$ convolutional tight frames along
scattering channels and applying a soft-thresholding reduces within-class
variabilities. The resulting scattering network reaches the classification
accuracy of ResNet-18 on CIFAR-10 and ImageNet, with fewer layers and no
learned biases.
Related papers
- The Lipschitz-Variance-Margin Tradeoff for Enhanced Randomized Smoothing [85.85160896547698]
Real-life applications of deep neural networks are hindered by their unsteady predictions when faced with noisy inputs and adversarial attacks.
We show how to design an efficient classifier with a certified radius by relying on noise injection into the inputs.
Our novel certification procedure allows us to use pre-trained models with randomized smoothing, effectively improving the current certification radius in a zero-shot manner.
arXiv Detail & Related papers (2023-09-28T22:41:47Z) - The Implicit Bias of Batch Normalization in Linear Models and Two-layer
Linear Convolutional Neural Networks [117.93273337740442]
We show that gradient descent converges to a uniform margin classifier on the training data with an $exp(-Omega(log2 t))$ convergence rate.
We also show that batch normalization has an implicit bias towards a patch-wise uniform margin.
arXiv Detail & Related papers (2023-06-20T16:58:00Z) - Variational Classification [51.2541371924591]
We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders.
Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency.
We induce a chosen latent distribution, instead of the implicit assumption found in a standard softmax layer.
arXiv Detail & Related papers (2023-05-17T17:47:19Z) - Discriminability-enforcing loss to improve representation learning [20.4701676109641]
We introduce a new loss term inspired by the Gini impurity to minimize the entropy of individual high-level features.
Although our Gini loss induces highly-discriminative features, it does not ensure that the distribution of the high-level features matches the distribution of the classes.
Our empirical results show that integrating our novel loss terms into the training objective consistently outperforms the models trained with cross-entropy alone.
arXiv Detail & Related papers (2022-02-14T22:31:37Z) - Instance-based Label Smoothing For Better Calibrated Classification
Networks [3.388509725285237]
Label smoothing is widely used in deep neural networks for multi-class classification.
We take inspiration from both label smoothing and self-distillation.
We propose two novel instance-based label smoothing approaches.
arXiv Detail & Related papers (2021-10-11T15:33:23Z) - Phase Collapse in Neural Networks [1.8620637029128544]
Deep convolutional image classifiers progressively transform the spatial variability into a smaller number of channels, which linearly separates all classes.
This paper demonstrates that it is a different phase collapse mechanism which explains the ability to progressively eliminate spatial variability.
It is justified by explaining how iterated phase collapses progressively improve separation of class means, as opposed to thresholding non-linearities.
arXiv Detail & Related papers (2021-10-11T13:58:01Z) - Intra-Class Uncertainty Loss Function for Classification [6.523198497365588]
intra-class uncertainty/variability is not considered, especially for datasets containing unbalanced classes.
In our framework, the features extracted by deep networks of each class are characterized by independent Gaussian distribution.
The proposed approach shows improved classification performance, through learning a better class representation.
arXiv Detail & Related papers (2021-04-12T09:02:41Z) - ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN.
We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z) - Regularizing Class-wise Predictions via Self-knowledge Distillation [80.76254453115766]
We propose a new regularization method that penalizes the predictive distribution between similar samples.
This results in regularizing the dark knowledge (i.e., the knowledge on wrong predictions) of a single network.
Our experimental results on various image classification tasks demonstrate that the simple yet powerful method can significantly improve the generalization ability.
arXiv Detail & Related papers (2020-03-31T06:03:51Z) - Embedding Propagation: Smoother Manifold for Few-Shot Classification [131.81692677836202]
We propose to use embedding propagation as an unsupervised non-parametric regularizer for manifold smoothing in few-shot classification.
We empirically show that embedding propagation yields a smoother embedding manifold.
We show that embedding propagation consistently improves the accuracy of the models in multiple semi-supervised learning scenarios by up to 16% points.
arXiv Detail & Related papers (2020-03-09T13:51:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.