Training Invertible Linear Layers through Rank-One Perturbations
- URL: http://arxiv.org/abs/2010.07033v2
- Date: Tue, 1 Dec 2020 00:58:50 GMT
- Title: Training Invertible Linear Layers through Rank-One Perturbations
- Authors: Andreas Kr\"amer, Jonas K\"ohler and Frank No\'e
- Abstract summary: This work presents a novel approach for training invertible linear layers.
In lieu of directly optimizing the network parameters, we train rank-one perturbations and add them to the actual weight matrices infrequently.
We show how such invertible blocks improve the mixing and thus normalizing the mode separation of the resulting flows.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many types of neural network layers rely on matrix properties such as
invertibility or orthogonality. Retaining such properties during optimization
with gradient-based stochastic optimizers is a challenging task, which is
usually addressed by either reparameterization of the affected parameters or by
directly optimizing on the manifold. This work presents a novel approach for
training invertible linear layers. In lieu of directly optimizing the network
parameters, we train rank-one perturbations and add them to the actual weight
matrices infrequently. This P$^{4}$Inv update allows keeping track of inverses
and determinants without ever explicitly computing them. We show how such
invertible blocks improve the mixing and thus the mode separation of the
resulting normalizing flows. Furthermore, we outline how the P$^4$ concept can
be utilized to retain properties other than invertibility.
Related papers
- Efficient Adaptation of Pre-trained Vision Transformer via Householder Transformation [53.88562288388169]
A common strategy for.
Efficient Fine-Tuning (PEFT) of pre-trained Vision Transformers (ViTs) involves adapting the model to downstream tasks.
We propose a novel PEFT approach inspired by Singular Value Decomposition (SVD) for representing the adaptation matrix.
SVD decomposes a matrix into the product of a left unitary matrix, a diagonal matrix of scaling values, and a right unitary matrix.
arXiv Detail & Related papers (2024-10-30T12:08:30Z) - Optimal Matrix-Mimetic Tensor Algebras via Variable Projection [0.0]
Matrix mimeticity arises from interpreting tensors as operators that can be multiplied, factorized, and analyzed analogous to matrices.
We learn optimal linear mappings and corresponding tensor representations without relying on prior knowledge of the data.
We provide original theory of uniqueness of the transformation and convergence analysis of our variable-projection-based algorithm.
arXiv Detail & Related papers (2024-06-11T04:52:23Z) - Unnatural Algorithms in Machine Learning [0.0]
We show that optimization algorithms with this property can be viewed as discrete approximations of natural gradient descent.
We introduce a simple method of introducing this naturality more generally and examine a number of popular machine learning training algorithms.
arXiv Detail & Related papers (2023-12-07T22:43:37Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Smooth over-parameterized solvers for non-smooth structured optimization [3.756550107432323]
Non-smoothness encodes structural constraints on the solutions, such as sparsity, group sparsity, low-rank edges and sharp edges.
We operate a non-weighted but smooth overparametrization of the underlying nonsmooth optimization problems.
Our main contribution is to apply the Variable Projection (VarPro) which defines a new formulation by explicitly minimizing over part of the variables.
arXiv Detail & Related papers (2022-05-03T09:23:07Z) - Memory-Efficient Backpropagation through Large Linear Layers [107.20037639738433]
In modern neural networks like Transformers, linear layers require significant memory to store activations during backward pass.
This study proposes a memory reduction approach to perform backpropagation through linear layers.
arXiv Detail & Related papers (2022-01-31T13:02:41Z) - On the training of sparse and dense deep neural networks: less
parameters, same performance [0.0]
We propose a variant of the spectral learning method as appeared in Giambagli et al Nat. Comm. 2021.
The eigenvalues act as veritable knobs which can be freely tuned so as to (i) enhance, or alternatively silence, the contribution of the input nodes.
Each spectral parameter reflects back on the whole set of inter-nodes weights, an attribute which we shall effectively exploit to yield sparse networks with stunning classification abilities.
arXiv Detail & Related papers (2021-06-17T14:54:23Z) - LQF: Linear Quadratic Fine-Tuning [114.3840147070712]
We present the first method for linearizing a pre-trained model that achieves comparable performance to non-linear fine-tuning.
LQF consists of simple modifications to the architecture, loss function and optimization typically used for classification.
arXiv Detail & Related papers (2020-12-21T06:40:20Z) - Channel-Directed Gradients for Optimization of Convolutional Neural
Networks [50.34913837546743]
We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error.
We show that defining the gradients along the output channel direction leads to a performance boost, while other directions can be detrimental.
arXiv Detail & Related papers (2020-08-25T00:44:09Z) - Eigendecomposition-Free Training of Deep Networks for Linear
Least-Square Problems [107.3868459697569]
We introduce an eigendecomposition-free approach to training a deep network.
We show that our approach is much more robust than explicit differentiation of the eigendecomposition.
Our method has better convergence properties and yields state-of-the-art results.
arXiv Detail & Related papers (2020-04-15T04:29:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.