The dynamics of representation learning in shallow, non-linear
autoencoders
- URL: http://arxiv.org/abs/2201.02115v1
- Date: Thu, 6 Jan 2022 15:57:31 GMT
- Title: The dynamics of representation learning in shallow, non-linear
autoencoders
- Authors: Maria Refinetti and Sebastian Goldt
- Abstract summary: We study the dynamics of feature learning in non-linear, shallow autoencoders.
An analysis of the long-time dynamics explains the failure of sigmoidal autoencoders to learn with tied weights.
We show that our equations accurately describe the generalisation dynamics of non-linear autoencoders on realistic datasets.
- Score: 3.1219977244201056
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autoencoders are the simplest neural network for unsupervised learning, and
thus an ideal framework for studying feature learning. While a detailed
understanding of the dynamics of linear autoencoders has recently been
obtained, the study of non-linear autoencoders has been hindered by the
technical difficulty of handling training data with non-trivial correlations -
a fundamental prerequisite for feature extraction. Here, we study the dynamics
of feature learning in non-linear, shallow autoencoders. We derive a set of
asymptotically exact equations that describe the generalisation dynamics of
autoencoders trained with stochastic gradient descent (SGD) in the limit of
high-dimensional inputs. These equations reveal that autoencoders learn the
leading principal components of their inputs sequentially. An analysis of the
long-time dynamics explains the failure of sigmoidal autoencoders to learn with
tied weights, and highlights the importance of training the bias in ReLU
autoencoders. Building on previous results for linear networks, we analyse a
modification of the vanilla SGD algorithm which allows learning of the exact
principal components. Finally, we show that our equations accurately describe
the generalisation dynamics of non-linear autoencoders on realistic datasets
such as CIFAR10.
Related papers
- Learning Linear Attention in Polynomial Time [115.68795790532289]
We provide the first results on learnability of single-layer Transformers with linear attention.
We show that linear attention may be viewed as a linear predictor in a suitably defined RKHS.
We show how to efficiently identify training datasets for which every empirical riskr is equivalent to the linear Transformer.
arXiv Detail & Related papers (2024-10-14T02:41:01Z) - Physics-enhanced Gaussian Process Variational Autoencoder [21.222154875601984]
Variational autoencoders allow to learn a lower-dimensional latent space based on high-dimensional input/output data.
We propose a physics-enhanced variational autoencoder that places a physical-enhanced Gaussian process prior on the latent dynamics.
The benefits of the proposed approach are highlighted in a simulation with an oscillating particle.
arXiv Detail & Related papers (2023-05-15T20:41:39Z) - On Robust Numerical Solver for ODE via Self-Attention Mechanism [82.95493796476767]
We explore training efficient and robust AI-enhanced numerical solvers with a small data size by mitigating intrinsic noise disturbances.
We first analyze the ability of the self-attention mechanism to regulate noise in supervised learning and then propose a simple-yet-effective numerical solver, Attr, which introduces an additive self-attention mechanism to the numerical solution of differential equations.
arXiv Detail & Related papers (2023-02-05T01:39:21Z) - Fundamental Limits of Two-layer Autoencoders, and Achieving Them with
Gradient Methods [91.54785981649228]
This paper focuses on non-linear two-layer autoencoders trained in the challenging proportional regime.
Our results characterize the minimizers of the population risk, and show that such minimizers are achieved by gradient methods.
For the special case of a sign activation function, our analysis establishes the fundamental limits for the lossy compression of Gaussian sources via (shallow) autoencoders.
arXiv Detail & Related papers (2022-12-27T12:37:34Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - A Coding Framework and Benchmark towards Low-Bitrate Video Understanding [63.05385140193666]
We propose a traditional-neural mixed coding framework that takes advantage of both traditional codecs and neural networks (NNs)
The framework is optimized by ensuring that a transportation-efficient semantic representation of the video is preserved.
We build a low-bitrate video understanding benchmark with three downstream tasks on eight datasets, demonstrating the notable superiority of our approach.
arXiv Detail & Related papers (2022-02-06T16:29:15Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - On the Regularization of Autoencoders [14.46779433267854]
We show that the unsupervised setting by itself induces strong additional regularization, i.e., a severe reduction in the model-capacity of the learned autoencoder.
We derive that a deep nonlinear autoencoder cannot fit the training data more accurately than a linear autoencoder does if both models have the same dimensionality in their last layer.
We demonstrate that it is an accurate approximation across all model-ranks in our experiments on three well-known data sets.
arXiv Detail & Related papers (2021-10-21T18:28:25Z) - Implicit Greedy Rank Learning in Autoencoders via Overparameterized
Linear Networks [7.412225511828064]
Deep linear networks trained with gradient descent yield low rank solutions.
We show greedy learning of low-rank latent codes induced by a linear sub-network at the autoencoder bottleneck.
arXiv Detail & Related papers (2021-07-02T23:17:50Z) - Training Stacked Denoising Autoencoders for Representation Learning [0.0]
We implement stacked autoencoders, a class of neural networks that are capable of learning powerful representations of high dimensional data.
We describe gradient descent for unsupervised training of autoencoders, as well as a novel genetic algorithm based approach that makes use of gradient information.
arXiv Detail & Related papers (2021-02-16T08:18:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.