Related papers: The DEformer: An Order-Agnostic Distribution Estimating Transformer

The DEformer: An Order-Agnostic Distribution Estimating Transformer

URL: http://arxiv.org/abs/2106.06989v1
Date: Sun, 13 Jun 2021 13:33:31 GMT
Title: The DEformer: An Order-Agnostic Distribution Estimating Transformer
Authors: Michael A. Alcorn, Anh Nguyen
Abstract summary: Order-agnostic autoregressive distribution estimation (OADE) is a challenging problem in generative machine learning. We propose an alternative approach for encoding feature identities, where each feature's identity is included alongside its value in the input. We show that a Transformer trained on this input can effectively model binarized-MNIST, approaching the average negative log-likelihood of fixed order autoregressive distribution estimating algorithms.
Score: 17.352818121007576
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Order-agnostic autoregressive distribution estimation (OADE), i.e., autoregressive distribution estimation where the features can occur in an arbitrary order, is a challenging problem in generative machine learning. Prior work on OADE has encoded feature identity (e.g., pixel location) by assigning each feature to a distinct fixed position in an input vector. As a result, architectures built for these inputs must strategically mask either the input or model weights to learn the various conditional distributions necessary for inferring the full joint distribution of the dataset in an order-agnostic way. In this paper, we propose an alternative approach for encoding feature identities, where each feature's identity is included alongside its value in the input. This feature identity encoding strategy allows neural architectures designed for sequential data to be applied to the OADE task without modification. As a proof of concept, we show that a Transformer trained on this input (which we refer to as "the DEformer", i.e., the distribution estimating Transformer) can effectively model binarized-MNIST, approaching the average negative log-likelihood of fixed order autoregressive distribution estimating algorithms while still being entirely order-agnostic.

Related papers

AI-Powered Bayesian Inference [0.0]
Generative Artificial Intelligence (GAI) has heralded an inflection point that changed how society thinks about knowledge acquisition. While GAI cannot be fully trusted for decision-making, it may still provide valuable information that can be integrated into a decision pipeline. variable answers to given prompts can be leveraged to construct a prior distribution which reflects assuredness of AI predictions.
arXiv Detail & Related papers (2025-02-26T15:42:06Z)
Entropy-Lens: The Information Signature of Transformer Computations [14.613982627206884]
We study the evolution of token-level distributions directly in vocabulary space.<n>We compute the Shannon entropy of each intermediate predicted distribution, yielding one interpretable scalar per layer.<n>We introduce Entropy-Lens, a model-agnostic framework that extracts entropy profiles from frozen, off-the-shelf transformers.
arXiv Detail & Related papers (2025-02-23T13:33:27Z)
Gaussian Mixture Vector Quantization with Aggregated Categorical Posterior [5.862123282894087]
We introduce the Vector Quantized Variational Autoencoder (VQ-VAE) VQ-VAE is a type of variational autoencoder using discrete embedding as latent. We show that GM-VQ improves codebook utilization and reduces information loss without relying on handcrafteds.
arXiv Detail & Related papers (2024-10-14T05:58:11Z)
VeriFlow: Modeling Distributions for Neural Network Verification [4.3012765978447565]
Formal verification has emerged as a promising method to ensure the safety and reliability of neural networks. We propose the VeriFlow architecture as a flow based density model tailored to allow any verification approach to restrict its search to the some data distribution of interest.
arXiv Detail & Related papers (2024-06-20T12:41:39Z)
Delta-AI: Local objectives for amortized inference in sparse graphical models [64.5938437823851]
We present a new algorithm for amortized inference in sparse probabilistic graphical models (PGMs) Our approach is based on the observation that when the sampling of variables in a PGM is seen as a sequence of actions taken by an agent, sparsity of the PGM enables local credit assignment in the agent's policy learning objective. We illustrate $Delta$-AI's effectiveness for sampling from synthetic PGMs and training latent variable models with sparse factor structure.
arXiv Detail & Related papers (2023-10-03T20:37:03Z)
Symmetric Equilibrium Learning of VAEs [56.56929742714685]
We view variational autoencoders (VAEs) as decoder-encoder pairs, which map distributions in the data space to distributions in the latent space and vice versa. We propose a Nash equilibrium learning approach, which is symmetric with respect to the encoder and decoder and allows learning VAEs in situations where both the data and the latent distributions are accessible only by sampling.
arXiv Detail & Related papers (2023-07-19T10:27:34Z)
A Simple Strategy to Provable Invariance via Orbit Mapping [14.127786615513978]
We propose a method to make network architectures provably invariant with respect to group actions. In a nutshell, we intend to 'undo' any possible transformation before feeding the data into the actual network.
arXiv Detail & Related papers (2022-09-24T03:40:42Z)
Certifying Model Accuracy under Distribution Shifts [151.67113334248464]
We present provable robustness guarantees on the accuracy of a model under bounded Wasserstein shifts of the data distribution. We show that a simple procedure that randomizes the input of the model within a transformation space is provably robust to distributional shifts under the transformation.
arXiv Detail & Related papers (2022-01-28T22:03:50Z)
Training or Architecture? How to Incorporate Invariance in Neural Networks [14.162739081163444]
We propose a method for provably invariant network architectures with respect to group actions. In a nutshell, we intend to 'undo' any possible transformation before feeding the data into the actual network. We analyze properties of such approaches, extend them to equivariant networks, and demonstrate their advantages in terms of robustness as well as computational efficiency in several numerical examples.
arXiv Detail & Related papers (2021-06-18T10:31:00Z)
Probabilistic Kolmogorov-Arnold Network [1.4732811715354455]
The present paper proposes a method for estimating probability distributions of the outputs in the case of aleatoric uncertainty. The suggested approach covers input-dependent probability distributions of the outputs, as well as the variation of the distribution type with the inputs. Although the method is applicable to any regression model, the present paper combines it with KANs, since the specific structure of KANs leads to computationally-efficient models' construction.
arXiv Detail & Related papers (2021-04-04T23:49:15Z)
Autoencoding Variational Autoencoder [56.05008520271406]
We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency. We show that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks.
arXiv Detail & Related papers (2020-12-07T14:16:14Z)
Identification of Probability weighted ARX models with arbitrary domains [75.91002178647165]
PieceWise Affine models guarantees universal approximation, local linearity and equivalence to other classes of hybrid system. In this work, we focus on the identification of PieceWise Auto Regressive with eXogenous input models with arbitrary regions (NPWARX) The architecture is conceived following the Mixture of Expert concept, developed within the machine learning field.
arXiv Detail & Related papers (2020-09-29T12:50:33Z)
Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs) Semi-implicit actor (SIA) powered by a flexible policy distribution. We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.