VTAE: Variational Transformer Autoencoder with Manifolds Learning
- URL: http://arxiv.org/abs/2304.00948v1
- Date: Mon, 3 Apr 2023 13:13:19 GMT
- Title: VTAE: Variational Transformer Autoencoder with Manifolds Learning
- Authors: Pourya Shamsolmoali, Masoumeh Zareapoor, Huiyu Zhou, Dacheng Tao,
Xuelong Li
- Abstract summary: Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables.
The nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning.
We show that geodesics and accurate computation can substantially improve the performance of deep generative models.
- Score: 144.0546653941249
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Deep generative models have demonstrated successful applications in learning
non-linear data distributions through a number of latent variables and these
models use a nonlinear function (generator) to map latent samples into the data
space. On the other hand, the nonlinearity of the generator implies that the
latent space shows an unsatisfactory projection of the data space, which
results in poor representation learning. This weak projection, however, can be
addressed by a Riemannian metric, and we show that geodesics computation and
accurate interpolations between data samples on the Riemannian manifold can
substantially improve the performance of deep generative models. In this paper,
a Variational spatial-Transformer AutoEncoder (VTAE) is proposed to minimize
geodesics on a Riemannian manifold and improve representation learning. In
particular, we carefully design the variational autoencoder with an encoded
spatial-Transformer to explicitly expand the latent variable model to data on a
Riemannian manifold, and obtain global context modelling. Moreover, to have
smooth and plausible interpolations while traversing between two different
objects' latent representations, we propose a geodesic interpolation network
different from the existing models that use linear interpolation with inferior
performance. Experiments on benchmarks show that our proposed model can improve
predictive accuracy and versatility over a range of computer vision tasks,
including image interpolations, and reconstructions.
Related papers
- Learning Divergence Fields for Shift-Robust Graph Representations [73.11818515795761]
In this work, we propose a geometric diffusion model with learnable divergence fields for the challenging problem with interdependent data.
We derive a new learning objective through causal inference, which can guide the model to learn generalizable patterns of interdependence that are insensitive across domains.
arXiv Detail & Related papers (2024-06-07T14:29:21Z) - Synthetic location trajectory generation using categorical diffusion
models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data.
We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z) - Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.
In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.
This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - Generative Modeling with Phase Stochastic Bridges [49.4474628881673]
Diffusion models (DMs) represent state-of-the-art generative models for continuous inputs.
We introduce a novel generative modeling framework grounded in textbfphase space dynamics
Our framework demonstrates the capability to generate realistic data points at an early stage of dynamics propagation.
arXiv Detail & Related papers (2023-10-11T18:38:28Z) - Subsurface Characterization using Ensemble-based Approaches with Deep
Generative Models [2.184775414778289]
Inverse modeling is limited for ill-posed, high-dimensional applications due to computational costs and poor prediction accuracy with sparse datasets.
We combine Wasserstein Geneversarative Adrial Network with Gradient Penalty (WGAN-GP) and Ensemble Smoother with Multiple Data Assimilation (ES-MDA)
WGAN-GP is trained to generate high-dimensional K fields from a low-dimensional latent space and ES-MDA updates the latent variables by assimilating available measurements.
arXiv Detail & Related papers (2023-10-02T01:27:10Z) - T1: Scaling Diffusion Probabilistic Fields to High-Resolution on Unified
Visual Modalities [69.16656086708291]
Diffusion Probabilistic Field (DPF) models the distribution of continuous functions defined over metric spaces.
We propose a new model comprising of a view-wise sampling algorithm to focus on local structure learning.
The model can be scaled to generate high-resolution data while unifying multiple modalities.
arXiv Detail & Related papers (2023-05-24T03:32:03Z) - RENs: Relevance Encoding Networks [0.0]
This paper proposes relevance encoding networks (RENs): a novel probabilistic VAE-based framework that uses the automatic relevance determination (ARD) prior in the latent space to learn the data-specific bottleneck dimensionality.
We show that the proposed model learns the relevant latent bottleneck dimensionality without compromising the representation and generation quality of the samples.
arXiv Detail & Related papers (2022-05-25T21:53:48Z) - Surface Vision Transformers: Attention-Based Modelling applied to
Cortical Analysis [8.20832544370228]
We introduce a domain-agnostic architecture to study any surface data projected onto a spherical manifold.
A vision transformer model encodes the sequence of patches via successive multi-head self-attention layers.
Experiments show that the SiT generally outperforms surface CNNs, while performing comparably on registered and unregistered data.
arXiv Detail & Related papers (2022-03-30T15:56:11Z) - Flow-based Generative Models for Learning Manifold to Manifold Mappings [39.60406116984869]
We introduce three kinds of invertible layers for manifold-valued data, which are analogous to their functionality in flow-based generative models.
We show promising results where we can reliably and accurately reconstruct brain images of a field of orientation distribution functions.
arXiv Detail & Related papers (2020-12-18T02:19:18Z) - Variational Autoencoder with Learned Latent Structure [4.41370484305827]
We introduce the Variational Autoencoder with Learned Latent Structure (VAELLS)
VAELLS incorporates a learnable manifold model into the latent space of a VAE.
We validate our model on examples with known latent structure and also demonstrate its capabilities on a real-world dataset.
arXiv Detail & Related papers (2020-06-18T14:59:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.