Related papers: Separation and Concentration in Deep Networks

Separation and Concentration in Deep Networks

URL: http://arxiv.org/abs/2012.10424v2
Date: Mon, 15 Mar 2021 15:06:51 GMT
Title: Separation and Concentration in Deep Networks
Authors: John Zarka, Florentin Guth, St\'ephane Mallat
Abstract summary: Deep neural network classifiers progressively separate class distributions around their mean. For image classification, we show that separation of class means can be achieved with rectified wavelet tight frames that are not learned. The resulting scattering network reaches the classification accuracy of ResNet-18 on CIFAR-10 and ImageNet, with fewer layers and no learned biases.
Score: 1.8620637029128544
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Numerical experiments demonstrate that deep neural network classifiers progressively separate class distributions around their mean, achieving linear separability on the training set, and increasing the Fisher discriminant ratio. We explain this mechanism with two types of operators. We prove that a rectifier without biases applied to sign-invariant tight frames can separate class means and increase Fisher ratios. On the opposite, a soft-thresholding on tight frames can reduce within-class variabilities while preserving class means. Variance reduction bounds are proved for Gaussian mixture models. For image classification, we show that separation of class means can be achieved with rectified wavelet tight frames that are not learned. It defines a scattering transform. Learning $1 \times 1$ convolutional tight frames along scattering channels and applying a soft-thresholding reduces within-class variabilities. The resulting scattering network reaches the classification accuracy of ResNet-18 on CIFAR-10 and ImageNet, with fewer layers and no learned biases.

Related papers

The Lipschitz-Variance-Margin Tradeoff for Enhanced Randomized Smoothing [85.85160896547698]
Real-life applications of deep neural networks are hindered by their unsteady predictions when faced with noisy inputs and adversarial attacks. We show how to design an efficient classifier with a certified radius by relying on noise injection into the inputs. Our novel certification procedure allows us to use pre-trained models with randomized smoothing, effectively improving the current certification radius in a zero-shot manner.
arXiv Detail & Related papers (2023-09-28T22:41:47Z)
The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks [117.93273337740442]
We show that gradient descent converges to a uniform margin classifier on the training data with an $exp(-Omega(log2 t))$ convergence rate. We also show that batch normalization has an implicit bias towards a patch-wise uniform margin.
arXiv Detail & Related papers (2023-06-20T16:58:00Z)
Variational Classification [51.2541371924591]
We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders. Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency. We induce a chosen latent distribution, instead of the implicit assumption found in a standard softmax layer.
arXiv Detail & Related papers (2023-05-17T17:47:19Z)
Discriminability-enforcing loss to improve representation learning [20.4701676109641]
We introduce a new loss term inspired by the Gini impurity to minimize the entropy of individual high-level features. Although our Gini loss induces highly-discriminative features, it does not ensure that the distribution of the high-level features matches the distribution of the classes. Our empirical results show that integrating our novel loss terms into the training objective consistently outperforms the models trained with cross-entropy alone.
arXiv Detail & Related papers (2022-02-14T22:31:37Z)
Instance-based Label Smoothing For Better Calibrated Classification Networks [3.388509725285237]
Label smoothing is widely used in deep neural networks for multi-class classification. We take inspiration from both label smoothing and self-distillation. We propose two novel instance-based label smoothing approaches.
arXiv Detail & Related papers (2021-10-11T15:33:23Z)
Phase Collapse in Neural Networks [1.8620637029128544]
Deep convolutional image classifiers progressively transform the spatial variability into a smaller number of channels, which linearly separates all classes. This paper demonstrates that it is a different phase collapse mechanism which explains the ability to progressively eliminate spatial variability. It is justified by explaining how iterated phase collapses progressively improve separation of class means, as opposed to thresholding non-linearities.
arXiv Detail & Related papers (2021-10-11T13:58:01Z)
Intra-Class Uncertainty Loss Function for Classification [6.523198497365588]
intra-class uncertainty/variability is not considered, especially for datasets containing unbalanced classes. In our framework, the features extracted by deep networks of each class are characterized by independent Gaussian distribution. The proposed approach shows improved classification performance, through learning a better class representation.
arXiv Detail & Related papers (2021-04-12T09:02:41Z)
ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN. We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z)
Regularizing Class-wise Predictions via Self-knowledge Distillation [80.76254453115766]
We propose a new regularization method that penalizes the predictive distribution between similar samples. This results in regularizing the dark knowledge (i.e., the knowledge on wrong predictions) of a single network. Our experimental results on various image classification tasks demonstrate that the simple yet powerful method can significantly improve the generalization ability.
arXiv Detail & Related papers (2020-03-31T06:03:51Z)
Embedding Propagation: Smoother Manifold for Few-Shot Classification [131.81692677836202]
We propose to use embedding propagation as an unsupervised non-parametric regularizer for manifold smoothing in few-shot classification. We empirically show that embedding propagation yields a smoother embedding manifold. We show that embedding propagation consistently improves the accuracy of the models in multiple semi-supervised learning scenarios by up to 16% points.
arXiv Detail & Related papers (2020-03-09T13:51:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.