Transformer Neural Autoregressive Flows
- URL: http://arxiv.org/abs/2401.01855v1
- Date: Wed, 3 Jan 2024 17:51:16 GMT
- Title: Transformer Neural Autoregressive Flows
- Authors: Massimiliano Patacchiola, Aliaksandra Shysheya, Katja Hofmann, Richard
E. Turner
- Abstract summary: Density estimation can be performed using Normalizing Flows (NFs)
We propose a novel solution by exploiting transformers to define a new class of neural flows called Transformer Neural Autoregressive Flows (T-NAFs)
- Score: 48.68932811531102
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Density estimation, a central problem in machine learning, can be performed
using Normalizing Flows (NFs). NFs comprise a sequence of invertible
transformations, that turn a complex target distribution into a simple one, by
exploiting the change of variables theorem. Neural Autoregressive Flows (NAFs)
and Block Neural Autoregressive Flows (B-NAFs) are arguably the most perfomant
members of the NF family. However, they suffer scalability issues and training
instability due to the constraints imposed on the network structure. In this
paper, we propose a novel solution to these challenges by exploiting
transformers to define a new class of neural flows called Transformer Neural
Autoregressive Flows (T-NAFs). T-NAFs treat each dimension of a random variable
as a separate input token, using attention masking to enforce an autoregressive
constraint. We take an amortization-inspired approach where the transformer
outputs the parameters of an invertible transformation. The experimental
results demonstrate that T-NAFs consistently match or outperform NAFs and
B-NAFs across multiple datasets from the UCI benchmark. Remarkably, T-NAFs
achieve these results using an order of magnitude fewer parameters than
previous approaches, without composing multiple flows.
Related papers
- Equivariant Neural Functional Networks for Transformers [2.3963215252605172]
This paper systematically explores neural functional networks (NFN) for transformer architectures.
NFN are specialized neural networks that treat the weights, gradients, or sparsity patterns of a deep neural network (DNN) as input data.
arXiv Detail & Related papers (2024-10-05T15:56:57Z) - Entropy-Informed Weighting Channel Normalizing Flow [7.751853409569806]
We propose a regularized and feature-dependent $mathttShuffle$ operation and integrate it into vanilla multi-scale architecture.
We observe that such operation guides the variables to evolve in the direction of entropy increase, hence we refer to NFs with the $mathttShuffle$ operation as emphEntropy-Informed Weighting Channel Normalizing Flow (EIW-Flow)
arXiv Detail & Related papers (2024-07-06T04:46:41Z) - Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot [50.16171384920963]
transformer architecture has prevailed in various deep learning settings.
One-layer transformer trained with gradient descent provably learns the sparse token selection task.
arXiv Detail & Related papers (2024-06-11T02:15:53Z) - Trained Transformers Learn Linear Models In-Context [39.56636898650966]
Attention-based neural networks as transformers have demonstrated a remarkable ability to exhibit inattention learning (ICL)
We show that when transformer training over random instances of linear regression problems, these models' predictions mimic nonlinear of ordinary squares.
arXiv Detail & Related papers (2023-06-16T15:50:03Z) - Optimizing Non-Autoregressive Transformers with Contrastive Learning [74.46714706658517]
Non-autoregressive Transformers (NATs) reduce the inference latency of Autoregressive Transformers (ATs) by predicting words all at once rather than in sequential order.
In this paper, we propose to ease the difficulty of modality learning via sampling from the model distribution instead of the data distribution.
arXiv Detail & Related papers (2023-05-23T04:20:13Z) - Transform Once: Efficient Operator Learning in Frequency Domain [69.74509540521397]
We study deep neural networks designed to harness the structure in frequency domain for efficient learning of long-range correlations in space or time.
This work introduces a blueprint for frequency domain learning through a single transform: transform once (T1)
arXiv Detail & Related papers (2022-11-26T01:56:05Z) - Factorized Fourier Neural Operators [77.47313102926017]
The Factorized Fourier Neural Operator (F-FNO) is a learning-based method for simulating partial differential equations.
We show that our model maintains an error rate of 2% while still running an order of magnitude faster than a numerical solver.
arXiv Detail & Related papers (2021-11-27T03:34:13Z) - Learning Likelihoods with Conditional Normalizing Flows [54.60456010771409]
Conditional normalizing flows (CNFs) are efficient in sampling and inference.
We present a study of CNFs where the base density to output space mapping is conditioned on an input x, to model conditional densities p(y|x)
arXiv Detail & Related papers (2019-11-29T19:17:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.