Implicit Greedy Rank Learning in Autoencoders via Overparameterized
Linear Networks
- URL: http://arxiv.org/abs/2107.01301v1
- Date: Fri, 2 Jul 2021 23:17:50 GMT
- Title: Implicit Greedy Rank Learning in Autoencoders via Overparameterized
Linear Networks
- Authors: Shih-Yu Sun, Vimal Thilak, Etai Littwin, Omid Saremi, Joshua M.
Susskind
- Abstract summary: Deep linear networks trained with gradient descent yield low rank solutions.
We show greedy learning of low-rank latent codes induced by a linear sub-network at the autoencoder bottleneck.
- Score: 7.412225511828064
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep linear networks trained with gradient descent yield low rank solutions,
as is typically studied in matrix factorization. In this paper, we take a step
further and analyze implicit rank regularization in autoencoders. We show
greedy learning of low-rank latent codes induced by a linear sub-network at the
autoencoder bottleneck. We further propose orthogonal initialization and
principled learning rate adjustment to mitigate sensitivity of training
dynamics to spectral prior and linear depth. With linear autoencoders on
synthetic data, our method converges stably to ground-truth latent code rank.
With nonlinear autoencoders, our method converges to latent ranks optimal for
downstream classification and image sampling.
Related papers
- Using linear initialisation to improve speed of convergence and
fully-trained error in Autoencoders [0.0]
We introduce a novel weight initialisation technique called the Straddled Matrix Initialiser.
Combination of Straddled Matrix and ReLU activation function initialises a Neural Network as a de facto linear model.
In all our experiments the Straddeled Matrix Initialiser clearly outperforms all other methods.
arXiv Detail & Related papers (2023-11-17T18:43:32Z) - Iterative Sketching for Secure Coded Regression [66.53950020718021]
We propose methods for speeding up distributed linear regression.
Specifically, we randomly rotate the basis of the system of equations and then subsample blocks, to simultaneously secure the information and reduce the dimension of the regression problem.
arXiv Detail & Related papers (2023-08-08T11:10:42Z) - Low-rank extended Kalman filtering for online learning of neural
networks from streaming data [71.97861600347959]
We propose an efficient online approximate Bayesian inference algorithm for estimating the parameters of a nonlinear function from a potentially non-stationary data stream.
The method is based on the extended Kalman filter (EKF), but uses a novel low-rank plus diagonal decomposition of the posterior matrix.
In contrast to methods based on variational inference, our method is fully deterministic, and does not require step-size tuning.
arXiv Detail & Related papers (2023-05-31T03:48:49Z) - Fundamental Limits of Two-layer Autoencoders, and Achieving Them with
Gradient Methods [91.54785981649228]
This paper focuses on non-linear two-layer autoencoders trained in the challenging proportional regime.
Our results characterize the minimizers of the population risk, and show that such minimizers are achieved by gradient methods.
For the special case of a sign activation function, our analysis establishes the fundamental limits for the lossy compression of Gaussian sources via (shallow) autoencoders.
arXiv Detail & Related papers (2022-12-27T12:37:34Z) - The dynamics of representation learning in shallow, non-linear
autoencoders [3.1219977244201056]
We study the dynamics of feature learning in non-linear, shallow autoencoders.
An analysis of the long-time dynamics explains the failure of sigmoidal autoencoders to learn with tied weights.
We show that our equations accurately describe the generalisation dynamics of non-linear autoencoders on realistic datasets.
arXiv Detail & Related papers (2022-01-06T15:57:31Z) - On the Regularization of Autoencoders [14.46779433267854]
We show that the unsupervised setting by itself induces strong additional regularization, i.e., a severe reduction in the model-capacity of the learned autoencoder.
We derive that a deep nonlinear autoencoder cannot fit the training data more accurately than a linear autoencoder does if both models have the same dimensionality in their last layer.
We demonstrate that it is an accurate approximation across all model-ranks in our experiments on three well-known data sets.
arXiv Detail & Related papers (2021-10-21T18:28:25Z) - Training Stacked Denoising Autoencoders for Representation Learning [0.0]
We implement stacked autoencoders, a class of neural networks that are capable of learning powerful representations of high dimensional data.
We describe gradient descent for unsupervised training of autoencoders, as well as a novel genetic algorithm based approach that makes use of gradient information.
arXiv Detail & Related papers (2021-02-16T08:18:22Z) - Short-Term Memory Optimization in Recurrent Neural Networks by
Autoencoder-based Initialization [79.42778415729475]
We explore an alternative solution based on explicit memorization using linear autoencoders for sequences.
We show how such pretraining can better support solving hard classification tasks with long sequences.
We show that the proposed approach achieves a much lower reconstruction error for long sequences and a better gradient propagation during the finetuning phase.
arXiv Detail & Related papers (2020-11-05T14:57:16Z) - Solving Sparse Linear Inverse Problems in Communication Systems: A Deep
Learning Approach With Adaptive Depth [51.40441097625201]
We propose an end-to-end trainable deep learning architecture for sparse signal recovery problems.
The proposed method learns how many layers to execute to emit an output, and the network depth is dynamically adjusted for each task in the inference phase.
arXiv Detail & Related papers (2020-10-29T06:32:53Z) - MetaSDF: Meta-learning Signed Distance Functions [85.81290552559817]
Generalizing across shapes with neural implicit representations amounts to learning priors over the respective function space.
We formalize learning of a shape space as a meta-learning problem and leverage gradient-based meta-learning algorithms to solve this task.
arXiv Detail & Related papers (2020-06-17T05:14:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.