Learning Deep-Latent Hierarchies by Stacking Wasserstein Autoencoders
- URL: http://arxiv.org/abs/2010.03467v1
- Date: Wed, 7 Oct 2020 15:04:20 GMT
- Title: Learning Deep-Latent Hierarchies by Stacking Wasserstein Autoencoders
- Authors: Benoit Gaujac and Ilya Feige and David Barber
- Abstract summary: We propose a novel approach to training models with deep-latent hierarchies based on Optimal Transport.
We show that our method enables the generative model to fully leverage its deep-latent hierarchy, avoiding the well known "latent variable collapse" issue of VAEs.
- Score: 22.54887526392739
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Probabilistic models with hierarchical-latent-variable structures provide
state-of-the-art results amongst non-autoregressive, unsupervised density-based
models. However, the most common approach to training such models based on
Variational Autoencoders (VAEs) often fails to leverage deep-latent
hierarchies; successful approaches require complex inference and optimisation
schemes. Optimal Transport is an alternative, non-likelihood-based framework
for training generative models with appealing theoretical properties, in
principle allowing easier training convergence between distributions. In this
work we propose a novel approach to training models with deep-latent
hierarchies based on Optimal Transport, without the need for highly bespoke
models and inference networks. We show that our method enables the generative
model to fully leverage its deep-latent hierarchy, avoiding the well known
"latent variable collapse" issue of VAEs; therefore, providing qualitatively
better sample generations as well as more interpretable latent representation
than the original Wasserstein Autoencoder with Maximum Mean Discrepancy
divergence.
Related papers
- Deep Autoencoder with SVD-Like Convergence and Flat Minima [1.0742675209112622]
We propose a learnable weighted hybrid autoencoder to overcome the Kolmogorov barrier.
We empirically find that our trained model has a sharpness thousands of times smaller compared to other models.
arXiv Detail & Related papers (2024-10-23T00:04:26Z) - Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models [54.132297393662654]
We introduce a hybrid method that fine-tunes cutting-edge diffusion models by optimizing reward models through RL.
We demonstrate the capability of our approach to outperform the best designs in offline data, leveraging the extrapolation capabilities of reward models.
arXiv Detail & Related papers (2024-05-30T03:57:29Z) - Bayesian sparsification for deep neural networks with Bayesian model
reduction [0.6144680854063939]
We advocate for the use of Bayesian model reduction (BMR) as a more efficient alternative for pruning of model weights.
BMR allows a post-hoc elimination of redundant model weights based on the posterior estimates under a straightforward (non-hierarchical) generative model.
We illustrate the potential of BMR across various deep learning architectures, from classical networks like LeNet to modern frameworks such as Vision and Transformers-Mixers.
arXiv Detail & Related papers (2023-09-21T14:10:47Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - Sequential Bayesian Neural Subnetwork Ensembles [4.6354120722975125]
We propose an approach for sequential ensembling of dynamic Bayesian neuralworks that consistently maintains reduced model complexity throughout the training process.
Our proposed approach outperforms traditional dense and sparse deterministic and Bayesian ensemble models in terms of prediction accuracy, uncertainty estimation, out-of-distribution detection, and adversarial robustness.
arXiv Detail & Related papers (2022-06-01T22:57:52Z) - Revisiting Design Choices in Model-Based Offline Reinforcement Learning [39.01805509055988]
Offline reinforcement learning enables agents to leverage large pre-collected datasets of environment transitions to learn control policies.
This paper compares and designs novel protocols to investigate their interaction with other hyper parameters, such as the number of models, or imaginary rollout horizon.
arXiv Detail & Related papers (2021-10-08T13:51:34Z) - Deep Variational Models for Collaborative Filtering-based Recommender
Systems [63.995130144110156]
Deep learning provides accurate collaborative filtering models to improve recommender system results.
Our proposed models apply the variational concept to injectity in the latent space of the deep architecture.
Results show the superiority of the proposed approach in scenarios where the variational enrichment exceeds the injected noise effect.
arXiv Detail & Related papers (2021-07-27T08:59:39Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - Self-Reflective Variational Autoencoder [21.054722609128525]
Variational Autoencoder (VAE) is a powerful framework for learning latent variable generative models.
We introduce a solution, which we call self-reflective inference.
We empirically demonstrate the clear advantages of matching the variational posterior to the exact posterior.
arXiv Detail & Related papers (2020-07-10T05:05:26Z) - Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model.
This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs)
The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z) - Nested-Wasserstein Self-Imitation Learning for Sequence Generation [158.19606942252284]
We propose the concept of nested-Wasserstein distance for distributional semantic matching.
A novel nested-Wasserstein self-imitation learning framework is developed, encouraging the model to exploit historical high-rewarded sequences.
arXiv Detail & Related papers (2020-01-20T02:19:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.