Fundamental Limits of Two-layer Autoencoders, and Achieving Them with
Gradient Methods
- URL: http://arxiv.org/abs/2212.13468v1
- Date: Tue, 27 Dec 2022 12:37:34 GMT
- Title: Fundamental Limits of Two-layer Autoencoders, and Achieving Them with
Gradient Methods
- Authors: Alexander Shevchenko, Kevin K\"ogler, Hamed Hassani, Marco Mondelli
- Abstract summary: This paper focuses on non-linear two-layer autoencoders trained in the challenging proportional regime.
Our results characterize the minimizers of the population risk, and show that such minimizers are achieved by gradient methods.
For the special case of a sign activation function, our analysis establishes the fundamental limits for the lossy compression of Gaussian sources via (shallow) autoencoders.
- Score: 91.54785981649228
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autoencoders are a popular model in many branches of machine learning and
lossy data compression. However, their fundamental limits, the performance of
gradient methods and the features learnt during optimization remain poorly
understood, even in the two-layer setting. In fact, earlier work has considered
either linear autoencoders or specific training regimes (leading to vanishing
or diverging compression rates). Our paper addresses this gap by focusing on
non-linear two-layer autoencoders trained in the challenging proportional
regime in which the input dimension scales linearly with the size of the
representation. Our results characterize the minimizers of the population risk,
and show that such minimizers are achieved by gradient methods; their structure
is also unveiled, thus leading to a concise description of the features
obtained via training. For the special case of a sign activation function, our
analysis establishes the fundamental limits for the lossy compression of
Gaussian sources via (shallow) autoencoders. Finally, while the results are
proved for Gaussian data, numerical simulations on standard datasets display
the universality of the theoretical predictions.
Related papers
- Language Models as Zero-shot Lossless Gradient Compressors: Towards
General Neural Parameter Prior Models [66.1595537904019]
Large language models (LLMs) can act as gradient priors in a zero-shot setting.
We introduce LM-GC, a novel method that integrates LLMs with arithmetic coding.
arXiv Detail & Related papers (2024-09-26T13:38:33Z) - A Fresh Take on Stale Embeddings: Improving Dense Retriever Training with Corrector Networks [81.2624272756733]
In dense retrieval, deep encoders provide embeddings for both inputs and targets.
We train a small parametric corrector network that adjusts stale cached target embeddings.
Our approach matches state-of-the-art results even when no target embedding updates are made during training.
arXiv Detail & Related papers (2024-09-03T13:29:13Z) - Compression of Structured Data with Autoencoders: Provable Benefit of
Nonlinearities and Depth [83.15263499262824]
We prove that gradient descent converges to a solution that completely disregards the sparse structure of the input.
We show how to improve upon Gaussian performance for the compression of sparse data by adding a denoising function to a shallow architecture.
We validate our findings on image datasets, such as CIFAR-10 and MNIST.
arXiv Detail & Related papers (2024-02-07T16:32:29Z) - Reducing Redundancy in the Bottleneck Representation of the Autoencoders [98.78384185493624]
Autoencoders are a type of unsupervised neural networks, which can be used to solve various tasks.
We propose a scheme to explicitly penalize feature redundancies in the bottleneck representation.
We tested our approach across different tasks: dimensionality reduction using three different dataset, image compression using the MNIST dataset, and image denoising using fashion MNIST.
arXiv Detail & Related papers (2022-02-09T18:48:02Z) - The dynamics of representation learning in shallow, non-linear
autoencoders [3.1219977244201056]
We study the dynamics of feature learning in non-linear, shallow autoencoders.
An analysis of the long-time dynamics explains the failure of sigmoidal autoencoders to learn with tied weights.
We show that our equations accurately describe the generalisation dynamics of non-linear autoencoders on realistic datasets.
arXiv Detail & Related papers (2022-01-06T15:57:31Z) - On the Regularization of Autoencoders [14.46779433267854]
We show that the unsupervised setting by itself induces strong additional regularization, i.e., a severe reduction in the model-capacity of the learned autoencoder.
We derive that a deep nonlinear autoencoder cannot fit the training data more accurately than a linear autoencoder does if both models have the same dimensionality in their last layer.
We demonstrate that it is an accurate approximation across all model-ranks in our experiments on three well-known data sets.
arXiv Detail & Related papers (2021-10-21T18:28:25Z) - An Information Theory-inspired Strategy for Automatic Network Pruning [88.51235160841377]
Deep convolution neural networks are well known to be compressed on devices with resource constraints.
Most existing network pruning methods require laborious human efforts and prohibitive computation resources.
We propose an information theory-inspired strategy for automatic model compression.
arXiv Detail & Related papers (2021-08-19T07:03:22Z) - Implicit Greedy Rank Learning in Autoencoders via Overparameterized
Linear Networks [7.412225511828064]
Deep linear networks trained with gradient descent yield low rank solutions.
We show greedy learning of low-rank latent codes induced by a linear sub-network at the autoencoder bottleneck.
arXiv Detail & Related papers (2021-07-02T23:17:50Z) - Implicit Rank-Minimizing Autoencoder [21.2045061949013]
Implicit Rank-Minimizing Autoencoder (IRMAE) is simple, deterministic, and learns compact latent spaces.
We demonstrate the validity of the method on several image generation and representation learning tasks.
arXiv Detail & Related papers (2020-10-01T20:48:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.