VOLTA: Improving Generative Diversity by Variational Mutual Information Maximizing Autoencoder
- URL: http://arxiv.org/abs/2307.00852v2
- Date: Tue, 19 Mar 2024 01:30:03 GMT
- Title: VOLTA: Improving Generative Diversity by Variational Mutual Information Maximizing Autoencoder
- Authors: Yueen Ma, Dafeng Chi, Jingjing Li, Kai Song, Yuzheng Zhuang, Irwin King,
- Abstract summary: We introduce VOLTA, a framework that elevates generative diversity by bridging Transformer with VAE.
We perform comprehensive experiments with two types of Transformers on six datasets to show that our approach can significantly improve generative diversity while maintaining generative quality.
- Score: 38.35049378875308
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The natural language generation domain has witnessed great success thanks to Transformer models. Although they have achieved state-of-the-art generative quality, they often neglect generative diversity. Prior attempts to tackle this issue suffer from either low model capacity or over-complicated architectures. Some recent methods employ the VAE framework to enhance diversity, but their latent variables fully depend on the input context, restricting exploration of the latent space. In this paper, we introduce VOLTA, a framework that elevates generative diversity by bridging Transformer with VAE via a more effective cross-attention-based connection, departing from conventional embedding concatenation or summation. Additionally, we propose integrating InfoGAN-style latent codes to enable input-independent variability, further diversifying the generation. Moreover, our framework accommodates discrete inputs alongside its existing support for continuous inputs. We perform comprehensive experiments with two types of Transformers on six datasets from three different NLG tasks to show that our approach can significantly improve generative diversity while maintaining generative quality.
Related papers
- LaVin-DiT: Large Vision Diffusion Transformer [99.98106406059333]
LaVin-DiT is a scalable and unified foundation model designed to tackle over 20 computer vision tasks in a generative framework.
We introduce key innovations to optimize generative performance for vision tasks.
The model is scaled from 0.1B to 3.4B parameters, demonstrating substantial scalability and state-of-the-art performance across diverse vision tasks.
arXiv Detail & Related papers (2024-11-18T12:05:27Z) - Score-Based Multimodal Autoencoders [4.594159253008448]
Multimodal Variational Autoencoders (VAEs) facilitate the construction of a tractable posterior within the latent space, given multiple modalities.
In this study, we explore an alternative approach to enhance the generative performance of multimodal VAEs by jointly modeling the latent space of unimodal VAEs.
Our model combines the superior generative quality of unimodal VAEs with coherent integration across different modalities.
arXiv Detail & Related papers (2023-05-25T04:43:47Z) - LayoutDM: Transformer-based Diffusion Model for Layout Generation [0.6445605125467572]
Transformer-based diffusion model (DDPM) is proposed to generate high-quality images.
Transformer-based conditional Layout Denoiser is proposed to generate samples from noised layout data.
Our method outperforms state-of-the-art generative models in terms of quality and diversity.
arXiv Detail & Related papers (2023-05-04T05:51:35Z) - Source-free Domain Adaptation Requires Penalized Diversity [60.04618512479438]
Source-free domain adaptation (SFDA) was introduced to address knowledge transfer between different domains in the absence of source data.
In unsupervised SFDA, the diversity is limited to learning a single hypothesis on the source or learning multiple hypotheses with a shared feature extractor.
We propose a novel unsupervised SFDA algorithm that promotes representational diversity through the use of separate feature extractors.
arXiv Detail & Related papers (2023-04-06T00:20:19Z) - A Transformer Framework for Data Fusion and Multi-Task Learning in Smart
Cities [99.56635097352628]
This paper proposes a Transformer-based AI system for emerging smart cities.
It supports virtually any input data and output task types present S&CCs.
It is demonstrated through learning diverse task sets representative of S&CC environments.
arXiv Detail & Related papers (2022-11-18T20:43:09Z) - Recurrence Boosts Diversity! Revisiting Recurrent Latent Variable in
Transformer-Based Variational AutoEncoder for Diverse Text Generation [85.5379146125199]
Variational Auto-Encoder (VAE) has been widely adopted in text generation.
We propose TRACE, a Transformer-based recurrent VAE structure.
arXiv Detail & Related papers (2022-10-22T10:25:35Z) - Improving Diversity with Adversarially Learned Transformations for
Domain Generalization [81.26960899663601]
We present a novel framework that uses adversarially learned transformations (ALT) using a neural network to model plausible, yet hard image transformations.
We show that ALT can naturally work with existing diversity modules to produce highly distinct, and large transformations of the source domain leading to state-of-the-art performance.
arXiv Detail & Related papers (2022-06-15T18:05:24Z) - Exploring Story Generation with Multi-task Objectives in Variational
Autoencoders [41.89428478049741]
GPT-2 fails to generate consistent stories and lacks diversity.
Current story generation models leverage additional information such as plots or commonsense into GPT-2 to guide the generation process.
We explore combining BERT and GPT-2 to build a variational autoencoder (VAE)
Our evaluations show our enhanced VAE can provide better quality and diversity trade off, generate less repetitive story content and learn a more informative latent variable.
arXiv Detail & Related papers (2021-11-15T23:07:19Z) - Transformer-based Conditional Variational Autoencoder for Controllable
Story Generation [39.577220559911055]
We investigate large-scale latent variable models (LVMs) for neural story generation with objectives in two threads: generation effectiveness and controllability.
We advocate to revive latent variable modeling, essentially the power of representation learning, in the era of Transformers.
Specifically, we integrate latent representation vectors with a Transformer-based pre-trained architecture to build conditional variational autoencoder (CVAE)
arXiv Detail & Related papers (2021-01-04T08:31:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.