Related papers: Fundamental Limits of Two-layer Autoencoders, and Achieving Them with Gradient Methods

Fundamental Limits of Two-layer Autoencoders, and Achieving Them with Gradient Methods

URL: http://arxiv.org/abs/2212.13468v1
Date: Tue, 27 Dec 2022 12:37:34 GMT
Title: Fundamental Limits of Two-layer Autoencoders, and Achieving Them with Gradient Methods
Authors: Alexander Shevchenko, Kevin K\"ogler, Hamed Hassani, Marco Mondelli
Abstract summary: This paper focuses on non-linear two-layer autoencoders trained in the challenging proportional regime. Our results characterize the minimizers of the population risk, and show that such minimizers are achieved by gradient methods. For the special case of a sign activation function, our analysis establishes the fundamental limits for the lossy compression of Gaussian sources via (shallow) autoencoders.
Score: 91.54785981649228
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Autoencoders are a popular model in many branches of machine learning and lossy data compression. However, their fundamental limits, the performance of gradient methods and the features learnt during optimization remain poorly understood, even in the two-layer setting. In fact, earlier work has considered either linear autoencoders or specific training regimes (leading to vanishing or diverging compression rates). Our paper addresses this gap by focusing on non-linear two-layer autoencoders trained in the challenging proportional regime in which the input dimension scales linearly with the size of the representation. Our results characterize the minimizers of the population risk, and show that such minimizers are achieved by gradient methods; their structure is also unveiled, thus leading to a concise description of the features obtained via training. For the special case of a sign activation function, our analysis establishes the fundamental limits for the lossy compression of Gaussian sources via (shallow) autoencoders. Finally, while the results are proved for Gaussian data, numerical simulations on standard datasets display the universality of the theoretical predictions.

Related papers

Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models [66.1595537904019]
Large language models (LLMs) can act as gradient priors in a zero-shot setting. We introduce LM-GC, a novel method that integrates LLMs with arithmetic coding.
arXiv Detail & Related papers (2024-09-26T13:38:33Z)
A Fresh Take on Stale Embeddings: Improving Dense Retriever Training with Corrector Networks [81.2624272756733]
In dense retrieval, deep encoders provide embeddings for both inputs and targets. We train a small parametric corrector network that adjusts stale cached target embeddings. Our approach matches state-of-the-art results even when no target embedding updates are made during training.
arXiv Detail & Related papers (2024-09-03T13:29:13Z)
Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth [83.15263499262824]
We prove that gradient descent converges to a solution that completely disregards the sparse structure of the input. We show how to improve upon Gaussian performance for the compression of sparse data by adding a denoising function to a shallow architecture. We validate our findings on image datasets, such as CIFAR-10 and MNIST.
arXiv Detail & Related papers (2024-02-07T16:32:29Z)
Reducing Redundancy in the Bottleneck Representation of the Autoencoders [98.78384185493624]
Autoencoders are a type of unsupervised neural networks, which can be used to solve various tasks. We propose a scheme to explicitly penalize feature redundancies in the bottleneck representation. We tested our approach across different tasks: dimensionality reduction using three different dataset, image compression using the MNIST dataset, and image denoising using fashion MNIST.
arXiv Detail & Related papers (2022-02-09T18:48:02Z)
The dynamics of representation learning in shallow, non-linear autoencoders [3.1219977244201056]
We study the dynamics of feature learning in non-linear, shallow autoencoders. An analysis of the long-time dynamics explains the failure of sigmoidal autoencoders to learn with tied weights. We show that our equations accurately describe the generalisation dynamics of non-linear autoencoders on realistic datasets.
arXiv Detail & Related papers (2022-01-06T15:57:31Z)
On the Regularization of Autoencoders [14.46779433267854]
We show that the unsupervised setting by itself induces strong additional regularization, i.e., a severe reduction in the model-capacity of the learned autoencoder. We derive that a deep nonlinear autoencoder cannot fit the training data more accurately than a linear autoencoder does if both models have the same dimensionality in their last layer. We demonstrate that it is an accurate approximation across all model-ranks in our experiments on three well-known data sets.
arXiv Detail & Related papers (2021-10-21T18:28:25Z)
An Information Theory-inspired Strategy for Automatic Network Pruning [88.51235160841377]
Deep convolution neural networks are well known to be compressed on devices with resource constraints. Most existing network pruning methods require laborious human efforts and prohibitive computation resources. We propose an information theory-inspired strategy for automatic model compression.
arXiv Detail & Related papers (2021-08-19T07:03:22Z)
Implicit Greedy Rank Learning in Autoencoders via Overparameterized Linear Networks [7.412225511828064]
Deep linear networks trained with gradient descent yield low rank solutions. We show greedy learning of low-rank latent codes induced by a linear sub-network at the autoencoder bottleneck.
arXiv Detail & Related papers (2021-07-02T23:17:50Z)
Implicit Rank-Minimizing Autoencoder [21.2045061949013]
Implicit Rank-Minimizing Autoencoder (IRMAE) is simple, deterministic, and learns compact latent spaces. We demonstrate the validity of the method on several image generation and representation learning tasks.
arXiv Detail & Related papers (2020-10-01T20:48:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.