Distributional autoencoders know the score
- URL: http://arxiv.org/abs/2502.11583v1
- Date: Mon, 17 Feb 2025 09:16:25 GMT
- Title: Distributional autoencoders know the score
- Authors: Andrej Leban,
- Abstract summary: We show that the level sets of the encoder orient themselves exactly with regard to the score of the data distribution.
In settings where the score itself has physical meaning, we demonstrate that the method can recover scientifically important quantities.
The fact that method is learning the score means that it could have promise as a generative model.
- Score: 0.0
- License:
- Abstract: This work presents novel and desirable properties of a recently introduced class of autoencoders -- the Distributional Principal Autoencoder (DPA) -- that combines distributionally correct reconstruction with principal components-like interpretability of the encodings. First, we show that the level sets of the encoder orient themselves exactly with regard to the score of the data distribution. This both explains the method's often remarkable performance in disentangling the the factors of variation of the data, as well as opens up possibilities of recovering its distribution while having access to samples only. In settings where the score itself has physical meaning -- such as when the data obey the Boltzmann distribution -- we demonstrate that the method can recover scientifically important quantities such as the \textit{minimum free energy path}. Second, we show that if the data lie on a manifold that can be approximated by the encoder, the optimal encoder's components beyond the dimension of the manifold will carry absolutely no additional information about the data distribution. This promises new ways of determining the number of relevant dimensions of the data beyond common heuristics such as the scree plot. Finally, the fact that the method is learning the score means that it could have promise as a generative model, potentially rivaling approaches such as diffusion, which similarly attempts to approximate the score of the data distribution.
Related papers
- Watermarking Generative Categorical Data [9.087950471621653]
Our method embeds secret signals by splitting the data distribution into two components and modifying one distribution based on a deterministic relationship with the other.
To verify the watermark, we introduce an insertion inverse algorithm and detect its presence by measuring the total variation distance between the inverse-decoded data and the original distribution.
arXiv Detail & Related papers (2024-11-16T21:57:45Z) - Learned Compression of Encoding Distributions [1.4732811715354455]
entropy bottleneck is a common component used in many learned compression models.
We propose a method that adapts the encoding distribution to match the latent data distribution for a specific input.
Our method achieves a Bjontegaard-Delta (BD)-rate gain of -7.10% on the Kodak test dataset.
arXiv Detail & Related papers (2024-06-18T21:05:51Z) - Distributional Principal Autoencoders [2.519266955671697]
Dimension reduction techniques usually lose information in the sense that reconstructed data are not identical to the original data.
We propose Distributional Principal Autoencoder (DPA) that consists of an encoder that maps high-dimensional data to low-dimensional latent variables.
For reconstructing data, the DPA decoder aims to match the conditional distribution of all data that are mapped to a certain latent value.
arXiv Detail & Related papers (2024-04-21T12:52:04Z) - Beyond the Known: Adversarial Autoencoders in Novelty Detection [2.7486022583843233]
In novelty detection, the goal is to decide if a new data point should be categorized as an inlier or an outlier.
We use a similar framework but with a lightweight deep network, and we adopt a probabilistic score with reconstruction error.
Our results indicate that our approach is effective at learning the target class, and it outperforms recent state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2024-04-06T00:04:19Z) - Fact Checking Beyond Training Set [64.88575826304024]
We show that the retriever-reader suffers from performance deterioration when it is trained on labeled data from one domain and used in another domain.
We propose an adversarial algorithm to make the retriever component robust against distribution shift.
We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models.
arXiv Detail & Related papers (2024-03-27T15:15:14Z) - Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space.
We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z) - Self-Conditioned Generative Adversarial Networks for Image Editing [61.50205580051405]
Generative Adversarial Networks (GANs) are susceptible to bias, learned from either the unbalanced data, or through mode collapse.
We argue that this bias is responsible not only for fairness concerns, but that it plays a key role in the collapse of latent-traversal editing methods when deviating away from the distribution's core.
arXiv Detail & Related papers (2022-02-08T18:08:24Z) - Neural Distributed Source Coding [59.630059301226474]
We present a framework for lossy DSC that is agnostic to the correlation structure and can scale to high dimensions.
We evaluate our method on multiple datasets and show that our method can handle complex correlations and state-of-the-art PSNR.
arXiv Detail & Related papers (2021-06-05T04:50:43Z) - Out-of-distribution Detection and Generation using Soft Brownian Offset
Sampling and Autoencoders [1.313418334200599]
Deep neural networks often suffer from overconfidence which can be partly remedied by improved out-of-distribution detection.
We propose a novel approach that allows for the generation of out-of-distribution datasets based on a given in-distribution dataset.
This new dataset can then be used to improve out-of-distribution detection for the given dataset and machine learning task at hand.
arXiv Detail & Related papers (2021-05-04T06:59:24Z) - Source-free Domain Adaptation via Distributional Alignment by Matching
Batch Normalization Statistics [85.75352990739154]
We propose a novel domain adaptation method for the source-free setting.
We use batch normalization statistics stored in the pretrained model to approximate the distribution of unobserved source data.
Our method achieves competitive performance with state-of-the-art domain adaptation methods.
arXiv Detail & Related papers (2021-01-19T14:22:33Z) - Autoencoding Variational Autoencoder [56.05008520271406]
We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency.
We show that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks.
arXiv Detail & Related papers (2020-12-07T14:16:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.