Phase Collapse in Neural Networks
- URL: http://arxiv.org/abs/2110.05283v1
- Date: Mon, 11 Oct 2021 13:58:01 GMT
- Title: Phase Collapse in Neural Networks
- Authors: Florentin Guth and John Zarka and St\'ephane Mallat
- Abstract summary: Deep convolutional image classifiers progressively transform the spatial variability into a smaller number of channels, which linearly separates all classes.
This paper demonstrates that it is a different phase collapse mechanism which explains the ability to progressively eliminate spatial variability.
It is justified by explaining how iterated phase collapses progressively improve separation of class means, as opposed to thresholding non-linearities.
- Score: 1.8620637029128544
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Deep convolutional image classifiers progressively transform the spatial
variability into a smaller number of channels, which linearly separates all
classes. A fundamental challenge is to understand the role of rectifiers
together with convolutional filters in this transformation. Rectifiers with
biases are often interpreted as thresholding operators which improve sparsity
and discrimination. This paper demonstrates that it is a different phase
collapse mechanism which explains the ability to progressively eliminate
spatial variability, while improving linear class separation. This is explained
and shown numerically by defining a simplified complex-valued convolutional
network architecture. It implements spatial convolutions with wavelet filters
and uses a complex modulus to collapse phase variables. This phase collapse
network reaches the classification accuracy of ResNets of similar depths,
whereas its performance is considerably degraded when replacing the phase
collapse with thresholding operators. This is justified by explaining how
iterated phase collapses progressively improve separation of class means, as
opposed to thresholding non-linearities.
Related papers
- Inversion dynamics of class manifolds in deep learning reveals tradeoffs
underlying generalisation [0.0]
We report on numerical experiments showing how the optimisation dynamics finds representations that balance opposing tendencies with a non-monotonic trend.
The training error at the inversion is stable under subsampling, and across network initialisations and optimisers, which characterises it as a property solely of the data structure and (very weakly) of the architecture.
arXiv Detail & Related papers (2023-03-09T10:35:40Z) - On the Shift Invariance of Max Pooling Feature Maps in Convolutional
Neural Networks [0.0]
Subsampled convolutions with Gabor-like filters are prone to aliasing, causing sensitivity to small input shifts.
We highlight the crucial role played by the filter's frequency and orientation in achieving stability.
We experimentally validate our theory by considering a deterministic feature extractor based on the dual-tree complex wavelet packet transform.
arXiv Detail & Related papers (2022-09-19T08:15:30Z) - Entangled Residual Mappings [59.02488598557491]
We introduce entangled residual mappings to generalize the structure of the residual connections.
An entangled residual mapping replaces the identity skip connections with specialized entangled mappings.
We show that while entangled mappings can preserve the iterative refinement of features across various deep models, they influence the representation learning process in convolutional networks.
arXiv Detail & Related papers (2022-06-02T19:36:03Z) - Learning strides in convolutional neural networks [34.20666933112202]
This work introduces DiffStride, the first downsampling layer with learnable strides.
Experiments on audio and image classification show the generality and effectiveness of our solution.
arXiv Detail & Related papers (2022-02-03T16:03:36Z) - Augmenting Convolutional networks with attention-based aggregation [55.97184767391253]
We show how to augment any convolutional network with an attention-based global map to achieve non-local reasoning.
We plug this learned aggregation layer with a simplistic patch-based convolutional network parametrized by 2 parameters (width and depth)
It yields surprisingly competitive trade-offs between accuracy and complexity, in particular in terms of memory consumption.
arXiv Detail & Related papers (2021-12-27T14:05:41Z) - Rethinking Skip Connection with Layer Normalization in Transformers and
ResNets [49.87919454950763]
Skip connection is a widely-used technique to improve the performance of deep neural networks.
In this work, we investigate how the scale factors in the effectiveness of the skip connection.
arXiv Detail & Related papers (2021-05-15T11:44:49Z) - Separation and Concentration in Deep Networks [1.8620637029128544]
Deep neural network classifiers progressively separate class distributions around their mean.
For image classification, we show that separation of class means can be achieved with rectified wavelet tight frames that are not learned.
The resulting scattering network reaches the classification accuracy of ResNet-18 on CIFAR-10 and ImageNet, with fewer layers and no learned biases.
arXiv Detail & Related papers (2020-12-18T18:27:37Z) - Deep Networks from the Principle of Rate Reduction [32.87280757001462]
This work attempts to interpret modern deep (convolutional) networks from the principles of rate reduction and (shift) invariant classification.
We show that the basic iterative ascent gradient scheme for optimizing the rate reduction of learned features naturally leads to a multi-layer deep network, one iteration per layer.
All components of this "white box" network have precise optimization, statistical, and geometric interpretation.
arXiv Detail & Related papers (2020-10-27T06:01:43Z) - Deriving Differential Target Propagation from Iterating Approximate
Inverses [91.3755431537592]
We show that a particular form of target propagation, relying on learned inverses of each layer, which is differential, gives rise to an update rule which corresponds to an approximate Gauss-Newton gradient-based optimization.
We consider several iterative calculations based on local auto-encoders at each layer in order to achieve more precise inversions for more accurate target propagation.
arXiv Detail & Related papers (2020-07-29T22:34:45Z) - Embedding Propagation: Smoother Manifold for Few-Shot Classification [131.81692677836202]
We propose to use embedding propagation as an unsupervised non-parametric regularizer for manifold smoothing in few-shot classification.
We empirically show that embedding propagation yields a smoother embedding manifold.
We show that embedding propagation consistently improves the accuracy of the models in multiple semi-supervised learning scenarios by up to 16% points.
arXiv Detail & Related papers (2020-03-09T13:51:09Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.