LU-Net: Invertible Neural Networks Based on Matrix Factorization
- URL: http://arxiv.org/abs/2302.10524v1
- Date: Tue, 21 Feb 2023 08:52:36 GMT
- Title: LU-Net: Invertible Neural Networks Based on Matrix Factorization
- Authors: Robin Chan, Sarina Penquitt, Hanno Gottschalk
- Abstract summary: LU-Net is a simple and fast architecture for invertible neural networks (INN)
Intrepid neural networks can be trained according to the maximum likelihood principle.
In our numerical experiments, we test the LU-net architecture as generative model on several academic datasets.
- Score: 1.2891210250935146
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: LU-Net is a simple and fast architecture for invertible neural networks (INN)
that is based on the factorization of quadratic weight matrices
$\mathsf{A=LU}$, where $\mathsf{L}$ is a lower triangular matrix with ones on
the diagonal and $\mathsf{U}$ an upper triangular matrix. Instead of learning a
fully occupied matrix $\mathsf{A}$, we learn $\mathsf{L}$ and $\mathsf{U}$
separately. If combined with an invertible activation function, such layers can
easily be inverted whenever the diagonal entries of $\mathsf{U}$ are different
from zero. Also, the computation of the determinant of the Jacobian matrix of
such layers is cheap. Consequently, the LU architecture allows for cheap
computation of the likelihood via the change of variables formula and can be
trained according to the maximum likelihood principle. In our numerical
experiments, we test the LU-net architecture as generative model on several
academic datasets. We also provide a detailed comparison with conventional
invertible neural networks in terms of performance, training as well as run
time.
Related papers
- Neural Learning of Fast Matrix Multiplication Algorithms: A StrassenNet Approach [36.2561379432247]
Fast matrix multiplication can be described as searching for low-rank decompositions of the matrix--multiplication tensor.<n>We design a neural architecture, textscStrassenNet, which reproduces the Strassen algorithm for $2times 2$ multiplication.<n>We then train the same architecture on $3times 3$ multiplication with rank $rin19,dots,23$.
arXiv Detail & Related papers (2026-02-25T11:22:31Z) - Neural Networks Learn Generic Multi-Index Models Near Information-Theoretic Limit [66.20349460098275]
We study the gradient descent learning of a general Gaussian Multi-index model $f(boldsymbolx)=g(boldsymbolUboldsymbolx)$ with hidden subspace $boldsymbolUin mathbbRrtimes d$.<n>We prove that under generic non-degenerate assumptions on the link function, a standard two-layer neural network trained via layer-wise gradient descent can agnostically learn the target with $o_d(1)$ test error.
arXiv Detail & Related papers (2025-11-19T04:46:47Z) - CDFlow: Building Invertible Layers with Circulant and Diagonal Matrices [2.6154833242452864]
We introduce a novel invertible linear layer based on the product of circulant and diagonal matrices.<n>This decomposition reduces parameter complexity from $mathcalO(n2)$ to $mathcalO(mn)$.<n>We build upon this layer to develop Circulant-Diagonal Flow (CDFlow), which achieves strong density estimation on natural image datasets.
arXiv Detail & Related papers (2025-10-29T09:38:50Z) - Geometry of fibers of the multiplication map of deep linear neural networks [0.0]
We study the geometry of the set of quivers of composable matrices which multiply to a fixed matrix.
Our solution is presented in three forms: a Poincar'e series in equivariant cohomology, a quadratic integer program, and an explicit formula.
arXiv Detail & Related papers (2024-11-29T18:36:03Z) - Multi-level projection with exponential parallel speedup; Application to sparse auto-encoders neural networks [2.264332709661011]
We show that the time complexity for the $ell_1,infty$ norm is only $mathcalObig(n m big)$ for a matrix in $mathbbRntimes m$.
Experiments show that our projection is $2$ times faster than the actual fastest Euclidean algorithms.
arXiv Detail & Related papers (2024-05-03T13:21:49Z) - Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks.
For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error.
This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z) - A Unified Scheme of ResNet and Softmax [8.556540804058203]
We provide a theoretical analysis of the regression problem: $| langle exp(Ax) + A x, bf 1_n rangle-1 ( exp(Ax) + Ax )
This regression problem is a unified scheme that combines softmax regression and ResNet, which has never been done before.
arXiv Detail & Related papers (2023-09-23T21:41:01Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - LU decomposition and Toeplitz decomposition of a neural network [5.276232626689567]
We show that any continuous function $f : mathbbRn to mathbbRm$ has an approximation to arbitrary accuracy by a neural network.
A consequence of our Toeplitz result is a fixed-width universal approximation for convolutional neural networks.
arXiv Detail & Related papers (2022-11-25T07:26:39Z) - Linear-Sample Learning of Low-Rank Distributions [56.59844655107251]
We show that learning $ktimes k$, rank-$r$, matrices to normalized $L_1$ distance requires $Omega(frackrepsilon2)$ samples.
We propose an algorithm that uses $cal O(frackrepsilon2log2fracepsilon)$ samples, a number linear in the high dimension, and nearly linear in the matrices, typically low, rank proofs.
arXiv Detail & Related papers (2020-09-30T19:10:32Z) - What if Neural Networks had SVDs? [66.91160214071088]
Various Neural Networks employ time-consuming matrix operations like matrix inversion.
We present an algorithm that is fast enough to speed up several matrix operations.
arXiv Detail & Related papers (2020-09-29T12:58:52Z) - Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK [58.5766737343951]
We consider the dynamic of descent for learning a two-layer neural network.
We show that an over-parametrized two-layer neural network can provably learn with gradient loss at most ground with Tangent samples.
arXiv Detail & Related papers (2020-07-09T07:09:28Z) - Constant-Depth and Subcubic-Size Threshold Circuits for Matrix
Multiplication [1.9518237361775532]
Recent advances in large-scale neural computing hardware has made their practical implementation a near-term possibility.
We describe a theoretical approach for multiplying two $N$ by $N$ matrices that integrates threshold gate logic.
Dense matrix multiplication is a core operation in convolutional neural network training.
arXiv Detail & Related papers (2020-06-25T18:28:10Z) - Backward Feature Correction: How Deep Learning Performs Deep
(Hierarchical) Learning [66.05472746340142]
This paper analyzes how multi-layer neural networks can perform hierarchical learning _efficiently_ and _automatically_ by SGD on the training objective.
We establish a new principle called "backward feature correction", where the errors in the lower-level features can be automatically corrected when training together with the higher-level layers.
arXiv Detail & Related papers (2020-01-13T17:28:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.