Why should autoencoders work?
- URL: http://arxiv.org/abs/2310.02250v3
- Date: Sat, 17 Feb 2024 15:56:20 GMT
- Title: Why should autoencoders work?
- Authors: Matthew D. Kvalheim and Eduardo D. Sontag
- Abstract summary: Deep neural network autoencoders are routinely used for model reduction.
We show that the technique is found to "work" well, which leads one to ask if there is a way to explain this effectiveness.
- Score: 1.6317061277457001
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural network autoencoders are routinely used computationally for model
reduction. They allow recognizing the intrinsic dimension of data that lie in a
$k$-dimensional subset $K$ of an input Euclidean space $\mathbb{R}^n$. The
underlying idea is to obtain both an encoding layer that maps $\mathbb{R}^n$
into $\mathbb{R}^k$ (called the bottleneck layer or the space of latent
variables) and a decoding layer that maps $\mathbb{R}^k$ back into
$\mathbb{R}^n$, in such a way that the input data from the set $K$ is recovered
when composing the two maps. This is achieved by adjusting parameters (weights)
in the network to minimize the discrepancy between the input and the
reconstructed output. Since neural networks (with continuous activation
functions) compute continuous maps, the existence of a network that achieves
perfect reconstruction would imply that $K$ is homeomorphic to a
$k$-dimensional subset of $\mathbb{R}^k$, so clearly there are topological
obstructions to finding such a network. On the other hand, in practice the
technique is found to "work" well, which leads one to ask if there is a way to
explain this effectiveness. We show that, up to small errors, indeed the method
is guaranteed to work. This is done by appealing to certain facts from
differential topology. A computational example is also included to illustrate
the ideas.
Related papers
- Deep Neural Networks: Multi-Classification and Universal Approximation [0.0]
We demonstrate that a ReLU deep neural network with a width of $2$ and a depth of $2N+4M-1$ layers can achieve finite sample memorization for any dataset comprising $N$ elements.
We also provide depth estimates for approximating $W1,p$ functions and width estimates for approximating $Lp(Omega;mathbbRm)$ for $mgeq1$.
arXiv Detail & Related papers (2024-09-10T14:31:21Z) - Mathematical Models of Computation in Superposition [0.9374652839580183]
Superposition poses a serious challenge to mechanistically interpreting current AI systems.
We present mathematical models of emphcomputation in superposition, where superposition is actively helpful for efficiently accomplishing the task.
We conclude by providing some potential applications of our work for interpreting neural networks that implement computation in superposition.
arXiv Detail & Related papers (2024-08-10T06:11:48Z) - Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks.
For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error.
This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z) - A Unified Scheme of ResNet and Softmax [8.556540804058203]
We provide a theoretical analysis of the regression problem: $| langle exp(Ax) + A x, bf 1_n rangle-1 ( exp(Ax) + Ax )
This regression problem is a unified scheme that combines softmax regression and ResNet, which has never been done before.
arXiv Detail & Related papers (2023-09-23T21:41:01Z) - Geometric structure of Deep Learning networks and construction of global ${\mathcal L}^2$ minimizers [1.189367612437469]
We explicitly determine local and global minimizers of the $mathcalL2$ cost function in underparametrized Deep Learning (DL) networks.
arXiv Detail & Related papers (2023-09-19T14:20:55Z) - Geometric structure of shallow neural networks and constructive ${\mathcal L}^2$ cost minimization [1.189367612437469]
We consider shallow neural networks with one hidden layer, a ReLU activation function, an $mathcal L2$ Schatten class (or Hilbert-Schmidt) cost function.
We prove an upper bound on the minimum of the cost function of order $O(delta_P)$ where $delta_P$ measures the signal to noise ratio of training inputs.
In the special case $M=Q$, we explicitly determine an exact degenerate local minimum of the cost function, and show that the sharp value differs from the upper bound obtained for $Qleq M$ by a
arXiv Detail & Related papers (2023-09-19T07:12:41Z) - Effective Minkowski Dimension of Deep Nonparametric Regression: Function
Approximation and Statistical Theories [70.90012822736988]
Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to intrinsic data structures.
This paper introduces a relaxed assumption that input data are concentrated around a subset of $mathbbRd$ denoted by $mathcalS$, and the intrinsic dimension $mathcalS$ can be characterized by a new complexity notation -- effective Minkowski dimension.
arXiv Detail & Related papers (2023-06-26T17:13:31Z) - Understanding Deep Neural Function Approximation in Reinforcement
Learning via $\epsilon$-Greedy Exploration [53.90873926758026]
This paper provides a theoretical study of deep neural function approximation in reinforcement learning (RL)
We focus on the value based algorithm with the $epsilon$-greedy exploration via deep (and two-layer) neural networks endowed by Besov (and Barron) function spaces.
Our analysis reformulates the temporal difference error in an $L2(mathrmdmu)$-integrable space over a certain averaged measure $mu$, and transforms it to a generalization problem under the non-iid setting.
arXiv Detail & Related papers (2022-09-15T15:42:47Z) - A deep network construction that adapts to intrinsic dimensionality
beyond the domain [79.23797234241471]
We study the approximation of two-layer compositions $f(x) = g(phi(x))$ via deep networks with ReLU activation.
We focus on two intuitive and practically relevant choices for $phi$: the projection onto a low-dimensional embedded submanifold and a distance to a collection of low-dimensional sets.
arXiv Detail & Related papers (2020-08-06T09:50:29Z) - Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK [58.5766737343951]
We consider the dynamic of descent for learning a two-layer neural network.
We show that an over-parametrized two-layer neural network can provably learn with gradient loss at most ground with Tangent samples.
arXiv Detail & Related papers (2020-07-09T07:09:28Z) - On the Modularity of Hypernetworks [103.1147622394852]
We show that for a structured target function, the overall number of trainable parameters in a hypernetwork is smaller by orders of magnitude than the number of trainable parameters of a standard neural network and an embedding method.
arXiv Detail & Related papers (2020-02-23T22:51:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.