Disentangling Multiple Features in Video Sequences using Gaussian
Processes in Variational Autoencoders
- URL: http://arxiv.org/abs/2001.02408v3
- Date: Sun, 19 Jul 2020 14:31:01 GMT
- Title: Disentangling Multiple Features in Video Sequences using Gaussian
Processes in Variational Autoencoders
- Authors: Sarthak Bhagat, Shagun Uppal, Zhuyun Yin and Nengli Lim
- Abstract summary: We introduce MGP-VAE, a variational autoencoder which uses Gaussian processes (GP) to model the latent space for the unsupervised learning of disentangled representations in video sequences.
We use fractional Brownian motions (fBM) and Brownian bridges (BB) to enforce an inter-frame correlation structure in each independent channel, and show that varying this structure enables one to capture different factors of variation in the data.
- Score: 6.461473289206789
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce MGP-VAE (Multi-disentangled-features Gaussian Processes
Variational AutoEncoder), a variational autoencoder which uses Gaussian
processes (GP) to model the latent space for the unsupervised learning of
disentangled representations in video sequences. We improve upon previous work
by establishing a framework by which multiple features, static or dynamic, can
be disentangled. Specifically we use fractional Brownian motions (fBM) and
Brownian bridges (BB) to enforce an inter-frame correlation structure in each
independent channel, and show that varying this structure enables one to
capture different factors of variation in the data. We demonstrate the quality
of our representations with experiments on three publicly available datasets,
and also quantify the improvement using a video prediction task. Moreover, we
introduce a novel geodesic loss function which takes into account the curvature
of the data manifold to improve learning. Our experiments show that the
combination of the improved representations with the novel loss function enable
MGP-VAE to outperform the baselines in video prediction.
Related papers
- Variational Bayes Gaussian Splatting [44.43761190929142]
3D Gaussian Splatting has emerged as a promising approach for modeling 3D scenes using mixtures of Gaussians.
We propose Variational Bayes Gaussian Splatting, a novel approach that frames training a Gaussian splat as variational inference over model parameters.
Our experiments show that VBGS not only matches state-of-the-art performance on static datasets, but also enables continual learning from sequentially streamed 2D and 3D data.
arXiv Detail & Related papers (2024-10-04T16:52:03Z) - Heterogeneous Multi-Task Gaussian Cox Processes [61.67344039414193]
We present a novel extension of multi-task Gaussian Cox processes for modeling heterogeneous correlated tasks jointly.
A MOGP prior over the parameters of the dedicated likelihoods for classification, regression and point process tasks can facilitate sharing of information between heterogeneous tasks.
We derive a mean-field approximation to realize closed-form iterative updates for estimating model parameters.
arXiv Detail & Related papers (2023-08-29T15:01:01Z) - VTAE: Variational Transformer Autoencoder with Manifolds Learning [144.0546653941249]
Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables.
The nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning.
We show that geodesics and accurate computation can substantially improve the performance of deep generative models.
arXiv Detail & Related papers (2023-04-03T13:13:19Z) - Modality-Agnostic Variational Compression of Implicit Neural
Representations [96.35492043867104]
We introduce a modality-agnostic neural compression algorithm based on a functional view of data and parameterised as an Implicit Neural Representation (INR)
Bridging the gap between latent coding and sparsity, we obtain compact latent representations non-linearly mapped to a soft gating mechanism.
After obtaining a dataset of such latent representations, we directly optimise the rate/distortion trade-off in a modality-agnostic space using neural compression.
arXiv Detail & Related papers (2023-01-23T15:22:42Z) - Bayesian Nonparametric Submodular Video Partition for Robust Anomaly
Detection [9.145168943972067]
Multiple-instance learning (MIL) provides an effective way to tackle the video anomaly detection problem.
We propose to conduct novel Bayesian non-parametric submodular video partition (BN-SVP) to significantly improve MIL model training.
Our theoretical analysis ensures a strong performance guarantee of the proposed algorithm.
arXiv Detail & Related papers (2022-03-24T04:00:49Z) - Non-Gaussian Gaussian Processes for Few-Shot Regression [71.33730039795921]
We propose an invertible ODE-based mapping that operates on each component of the random variable vectors and shares the parameters across all of them.
NGGPs outperform the competing state-of-the-art approaches on a diversified set of benchmarks and applications.
arXiv Detail & Related papers (2021-10-26T10:45:25Z) - Multi-Facet Clustering Variational Autoencoders [9.150555507030083]
High-dimensional data, such as images, typically feature multiple interesting characteristics one could cluster over.
We introduce Multi-Facet Clustering Variational Autoencoders (MFCVAE)
MFCVAE learns multiple clusterings simultaneously, and is trained fully unsupervised and end-to-end.
arXiv Detail & Related papers (2021-06-09T17:36:38Z) - Consistency Regularization for Variational Auto-Encoders [14.423556966548544]
Variational auto-encoders (VAEs) are a powerful approach to unsupervised learning.
We propose a regularization method to enforce consistency in VAEs.
arXiv Detail & Related papers (2021-05-31T10:26:32Z) - Autoencoding Variational Autoencoder [56.05008520271406]
We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency.
We show that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks.
arXiv Detail & Related papers (2020-12-07T14:16:14Z) - On the Encoder-Decoder Incompatibility in Variational Text Modeling and
Beyond [82.18770740564642]
Variational autoencoders (VAEs) combine latent variables with amortized variational inference.
We observe the encoder-decoder incompatibility that leads to poor parameterizations of the data manifold.
We propose Coupled-VAE, which couples a VAE model with a deterministic autoencoder with the same structure.
arXiv Detail & Related papers (2020-04-20T10:34:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.