Entangled Residual Mappings
- URL: http://arxiv.org/abs/2206.01261v1
- Date: Thu, 2 Jun 2022 19:36:03 GMT
- Title: Entangled Residual Mappings
- Authors: Mathias Lechner, Ramin Hasani, Zahra Babaiee, Radu Grosu, Daniela Rus,
Thomas A. Henzinger, Sepp Hochreiter
- Abstract summary: We introduce entangled residual mappings to generalize the structure of the residual connections.
An entangled residual mapping replaces the identity skip connections with specialized entangled mappings.
We show that while entangled mappings can preserve the iterative refinement of features across various deep models, they influence the representation learning process in convolutional networks.
- Score: 59.02488598557491
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Residual mappings have been shown to perform representation learning in the
first layers and iterative feature refinement in higher layers. This interplay,
combined with their stabilizing effect on the gradient norms, enables them to
train very deep networks. In this paper, we take a step further and introduce
entangled residual mappings to generalize the structure of the residual
connections and evaluate their role in iterative learning representations. An
entangled residual mapping replaces the identity skip connections with
specialized entangled mappings such as orthogonal, sparse, and structural
correlation matrices that share key attributes (eigenvalues, structure, and
Jacobian norm) with identity mappings. We show that while entangled mappings
can preserve the iterative refinement of features across various deep models,
they influence the representation learning process in convolutional networks
differently than attention-based models and recurrent neural networks. In
general, we find that for CNNs and Vision Transformers entangled sparse mapping
can help generalization while orthogonal mappings hurt performance. For
recurrent networks, orthogonal residual mappings form an inductive bias for
time-variant sequences, which degrades accuracy on time-invariant tasks.
Related papers
- Emergence of Globally Attracting Fixed Points in Deep Neural Networks With Nonlinear Activations [24.052411316664017]
We introduce a theoretical framework for the evolution of the kernel sequence, which measures the similarity between the hidden representation for two different inputs.
For nonlinear activations, the kernel sequence converges globally to a unique fixed point, which can correspond to similar representations depending on the activation and network architecture.
This work provides new insights into the implicit biases of deep neural networks and how architectural choices influence the evolution of representations across layers.
arXiv Detail & Related papers (2024-10-26T07:10:47Z) - A Library of Mirrors: Deep Neural Nets in Low Dimensions are Convex Lasso Models with Reflection Features [54.83898311047626]
We consider neural networks with piecewise linear activations ranging from 2 to an arbitrary but finite number of layers.
We first show that two-layer networks with piecewise linear activations are Lasso models using a discrete dictionary of ramp depths.
arXiv Detail & Related papers (2024-03-02T00:33:45Z) - Regularization, early-stopping and dreaming: a Hopfield-like setup to
address generalization and overfitting [0.0]
We look for optimal network parameters by applying a gradient descent over a regularized loss function.
Within this framework, the optimal neuron-interaction matrices correspond to Hebbian kernels revised by a reiterated unlearning protocol.
arXiv Detail & Related papers (2023-08-01T15:04:30Z) - Brain-like combination of feedforward and recurrent network components
achieves prototype extraction and robust pattern recognition [0.0]
Associative memory has been a prominent candidate for the computation performed by the massively recurrent neocortical networks.
We combine a recurrent attractor network with a feedforward network that learns distributed representations using an unsupervised Hebbian-Bayesian learning rule.
We demonstrate that the recurrent attractor component implements associative memory when trained on the feedforward-driven internal (hidden) representations.
arXiv Detail & Related papers (2022-06-30T06:03:11Z) - Clustering-Based Interpretation of Deep ReLU Network [17.234442722611803]
We recognize that the non-linear behavior of the ReLU function gives rise to a natural clustering.
We propose a method to increase the level of interpretability of a fully connected feedforward ReLU neural network.
arXiv Detail & Related papers (2021-10-13T09:24:11Z) - Topographic VAEs learn Equivariant Capsules [84.33745072274942]
We introduce the Topographic VAE: a novel method for efficiently training deep generative models with topographically organized latent variables.
We show that such a model indeed learns to organize its activations according to salient characteristics such as digit class, width, and style on MNIST.
We demonstrate approximate equivariance to complex transformations, expanding upon the capabilities of existing group equivariant neural networks.
arXiv Detail & Related papers (2021-09-03T09:25:57Z) - Generating Attribution Maps with Disentangled Masked Backpropagation [22.065454879517326]
We introduce Disentangled Masked Backpropagation (DMBP) to decompose the model function into different linear mappings.
DMBP generates more visually interpretable attribution maps than previous approaches.
We quantitatively show that the maps produced by our method are more consistent with the true contribution of each pixel to the final network output.
arXiv Detail & Related papers (2021-01-17T20:32:14Z) - Problems of representation of electrocardiograms in convolutional neural
networks [58.720142291102135]
We show that these problems are systemic in nature.
They are due to how convolutional networks work with composite objects, parts of which are not fixed rigidly, but have significant mobility.
arXiv Detail & Related papers (2020-12-01T14:02:06Z) - Out-of-distribution Generalization via Partial Feature Decorrelation [72.96261704851683]
We present a novel Partial Feature Decorrelation Learning (PFDL) algorithm, which jointly optimize a feature decomposition network and the target image classification model.
The experiments on real-world datasets demonstrate that our method can improve the backbone model's accuracy on OOD image classification datasets.
arXiv Detail & Related papers (2020-07-30T05:48:48Z) - Eigendecomposition-Free Training of Deep Networks for Linear
Least-Square Problems [107.3868459697569]
We introduce an eigendecomposition-free approach to training a deep network.
We show that our approach is much more robust than explicit differentiation of the eigendecomposition.
Our method has better convergence properties and yields state-of-the-art results.
arXiv Detail & Related papers (2020-04-15T04:29:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.