On the Regularization of Autoencoders
- URL: http://arxiv.org/abs/2110.11402v1
- Date: Thu, 21 Oct 2021 18:28:25 GMT
- Title: On the Regularization of Autoencoders
- Authors: Harald Steck and Dario Garcia Garcia
- Abstract summary: We show that the unsupervised setting by itself induces strong additional regularization, i.e., a severe reduction in the model-capacity of the learned autoencoder.
We derive that a deep nonlinear autoencoder cannot fit the training data more accurately than a linear autoencoder does if both models have the same dimensionality in their last layer.
We demonstrate that it is an accurate approximation across all model-ranks in our experiments on three well-known data sets.
- Score: 14.46779433267854
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: While much work has been devoted to understanding the implicit (and explicit)
regularization of deep nonlinear networks in the supervised setting, this paper
focuses on unsupervised learning, i.e., autoencoders are trained with the
objective of reproducing the output from the input. We extend recent results
[Jin et al. 2021] on unconstrained linear models and apply them to (1)
nonlinear autoencoders and (2) constrained linear autoencoders, obtaining the
following two results: first, we show that the unsupervised setting by itself
induces strong additional regularization, i.e., a severe reduction in the
model-capacity of the learned autoencoder: we derive that a deep nonlinear
autoencoder cannot fit the training data more accurately than a linear
autoencoder does if both models have the same dimensionality in their last
hidden layer (and under a few additional assumptions). Our second contribution
is concerned with the low-rank EDLAE model [Steck 2020], which is a linear
autoencoder with a constraint on the diagonal of the learned low-rank
parameter-matrix for improved generalization: we derive a closed-form
approximation to the optimum of its non-convex training-objective, and
empirically demonstrate that it is an accurate approximation across all
model-ranks in our experiments on three well-known data sets.
Related papers
- Loss-Free Machine Unlearning [51.34904967046097]
We present a machine unlearning approach that is both retraining- and label-free.
Retraining-free approaches often utilise Fisher information, which is derived from the loss and requires labelled data which may not be available.
We present an extension to the Selective Synaptic Dampening algorithm, substituting the diagonal of the Fisher information matrix for the gradient of the l2 norm of the model output to approximate sensitivity.
arXiv Detail & Related papers (2024-02-29T16:15:34Z) - Complexity Matters: Rethinking the Latent Space for Generative Modeling [65.64763873078114]
In generative modeling, numerous successful approaches leverage a low-dimensional latent space, e.g., Stable Diffusion.
In this study, we aim to shed light on this under-explored topic by rethinking the latent space from the perspective of model complexity.
arXiv Detail & Related papers (2023-07-17T07:12:29Z) - It's Enough: Relaxing Diagonal Constraints in Linear Autoencoders for
Recommendation [4.8802420827610025]
This paper aims to theoretically understand the properties of two terms in linear autoencoders.
We propose simple-yet-effective linear autoencoder models using diagonal inequality constraints, called Relaxed Linear AutoEncoder (RLAE) and Relaxed Denoising Linear AutoEncoder (RDLAE)
Experimental results demonstrate that our models are comparable or superior to state-of-the-art linear and non-linear models on six benchmark datasets.
arXiv Detail & Related papers (2023-05-22T11:09:49Z) - Fundamental Limits of Two-layer Autoencoders, and Achieving Them with
Gradient Methods [91.54785981649228]
This paper focuses on non-linear two-layer autoencoders trained in the challenging proportional regime.
Our results characterize the minimizers of the population risk, and show that such minimizers are achieved by gradient methods.
For the special case of a sign activation function, our analysis establishes the fundamental limits for the lossy compression of Gaussian sources via (shallow) autoencoders.
arXiv Detail & Related papers (2022-12-27T12:37:34Z) - Laplacian Autoencoders for Learning Stochastic Representations [0.6999740786886537]
We present a Bayesian autoencoder for unsupervised representation learning, which is trained using a novel variational lower-bound of the autoencoder evidence.
We show that our Laplacian autoencoder estimates well-calibrated uncertainties in both latent and output space.
arXiv Detail & Related papers (2022-06-30T07:23:16Z) - The dynamics of representation learning in shallow, non-linear
autoencoders [3.1219977244201056]
We study the dynamics of feature learning in non-linear, shallow autoencoders.
An analysis of the long-time dynamics explains the failure of sigmoidal autoencoders to learn with tied weights.
We show that our equations accurately describe the generalisation dynamics of non-linear autoencoders on realistic datasets.
arXiv Detail & Related papers (2022-01-06T15:57:31Z) - Implicit Greedy Rank Learning in Autoencoders via Overparameterized
Linear Networks [7.412225511828064]
Deep linear networks trained with gradient descent yield low rank solutions.
We show greedy learning of low-rank latent codes induced by a linear sub-network at the autoencoder bottleneck.
arXiv Detail & Related papers (2021-07-02T23:17:50Z) - LQF: Linear Quadratic Fine-Tuning [114.3840147070712]
We present the first method for linearizing a pre-trained model that achieves comparable performance to non-linear fine-tuning.
LQF consists of simple modifications to the architecture, loss function and optimization typically used for classification.
arXiv Detail & Related papers (2020-12-21T06:40:20Z) - Autoencoding Variational Autoencoder [56.05008520271406]
We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency.
We show that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks.
arXiv Detail & Related papers (2020-12-07T14:16:14Z) - Learning the Linear Quadratic Regulator from Nonlinear Observations [135.66883119468707]
We introduce a new problem setting for continuous control called the LQR with Rich Observations, or RichLQR.
In our setting, the environment is summarized by a low-dimensional continuous latent state with linear dynamics and quadratic costs.
Our results constitute the first provable sample complexity guarantee for continuous control with an unknown nonlinearity in the system model and general function approximation.
arXiv Detail & Related papers (2020-10-08T07:02:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.