LU-Net: Invertible Neural Networks Based on Matrix Factorization
- URL: http://arxiv.org/abs/2302.10524v1
- Date: Tue, 21 Feb 2023 08:52:36 GMT
- Title: LU-Net: Invertible Neural Networks Based on Matrix Factorization
- Authors: Robin Chan, Sarina Penquitt, Hanno Gottschalk
- Abstract summary: LU-Net is a simple and fast architecture for invertible neural networks (INN)
Intrepid neural networks can be trained according to the maximum likelihood principle.
In our numerical experiments, we test the LU-net architecture as generative model on several academic datasets.
- Score: 1.2891210250935146
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: LU-Net is a simple and fast architecture for invertible neural networks (INN)
that is based on the factorization of quadratic weight matrices
$\mathsf{A=LU}$, where $\mathsf{L}$ is a lower triangular matrix with ones on
the diagonal and $\mathsf{U}$ an upper triangular matrix. Instead of learning a
fully occupied matrix $\mathsf{A}$, we learn $\mathsf{L}$ and $\mathsf{U}$
separately. If combined with an invertible activation function, such layers can
easily be inverted whenever the diagonal entries of $\mathsf{U}$ are different
from zero. Also, the computation of the determinant of the Jacobian matrix of
such layers is cheap. Consequently, the LU architecture allows for cheap
computation of the likelihood via the change of variables formula and can be
trained according to the maximum likelihood principle. In our numerical
experiments, we test the LU-net architecture as generative model on several
academic datasets. We also provide a detailed comparison with conventional
invertible neural networks in terms of performance, training as well as run
time.
Related papers
- Geometry of fibers of the multiplication map of deep linear neural networks [0.0]
We study the geometry of the set of quivers of composable matrices which multiply to a fixed matrix.
Our solution is presented in three forms: a Poincar'e series in equivariant cohomology, a quadratic integer program, and an explicit formula.
arXiv Detail & Related papers (2024-11-29T18:36:03Z) - Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks.
For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error.
This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z) - A Unified Scheme of ResNet and Softmax [8.556540804058203]
We provide a theoretical analysis of the regression problem: $| langle exp(Ax) + A x, bf 1_n rangle-1 ( exp(Ax) + Ax )
This regression problem is a unified scheme that combines softmax regression and ResNet, which has never been done before.
arXiv Detail & Related papers (2023-09-23T21:41:01Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - LU decomposition and Toeplitz decomposition of a neural network [5.276232626689567]
We show that any continuous function $f : mathbbRn to mathbbRm$ has an approximation to arbitrary accuracy by a neural network.
A consequence of our Toeplitz result is a fixed-width universal approximation for convolutional neural networks.
arXiv Detail & Related papers (2022-11-25T07:26:39Z) - Linear-Sample Learning of Low-Rank Distributions [56.59844655107251]
We show that learning $ktimes k$, rank-$r$, matrices to normalized $L_1$ distance requires $Omega(frackrepsilon2)$ samples.
We propose an algorithm that uses $cal O(frackrepsilon2log2fracepsilon)$ samples, a number linear in the high dimension, and nearly linear in the matrices, typically low, rank proofs.
arXiv Detail & Related papers (2020-09-30T19:10:32Z) - What if Neural Networks had SVDs? [66.91160214071088]
Various Neural Networks employ time-consuming matrix operations like matrix inversion.
We present an algorithm that is fast enough to speed up several matrix operations.
arXiv Detail & Related papers (2020-09-29T12:58:52Z) - Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK [58.5766737343951]
We consider the dynamic of descent for learning a two-layer neural network.
We show that an over-parametrized two-layer neural network can provably learn with gradient loss at most ground with Tangent samples.
arXiv Detail & Related papers (2020-07-09T07:09:28Z) - Constant-Depth and Subcubic-Size Threshold Circuits for Matrix
Multiplication [1.9518237361775532]
Recent advances in large-scale neural computing hardware has made their practical implementation a near-term possibility.
We describe a theoretical approach for multiplying two $N$ by $N$ matrices that integrates threshold gate logic.
Dense matrix multiplication is a core operation in convolutional neural network training.
arXiv Detail & Related papers (2020-06-25T18:28:10Z) - Backward Feature Correction: How Deep Learning Performs Deep
(Hierarchical) Learning [66.05472746340142]
This paper analyzes how multi-layer neural networks can perform hierarchical learning _efficiently_ and _automatically_ by SGD on the training objective.
We establish a new principle called "backward feature correction", where the errors in the lower-level features can be automatically corrected when training together with the higher-level layers.
arXiv Detail & Related papers (2020-01-13T17:28:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.