Your Autoregressive Generative Model Can be Better If You Treat It as an
Energy-Based One
- URL: http://arxiv.org/abs/2206.12840v1
- Date: Sun, 26 Jun 2022 10:58:41 GMT
- Title: Your Autoregressive Generative Model Can be Better If You Treat It as an
Energy-Based One
- Authors: Yezhen Wang, Tong Che, Bo Li, Kaitao Song, Hengzhi Pei, Yoshua Bengio,
Dongsheng Li
- Abstract summary: We propose a unique method termed E-ARM for training autoregressive generative models.
E-ARM takes advantage of a well-designed energy-based learning objective.
We show that E-ARM can be trained efficiently and is capable of alleviating the exposure bias problem.
- Score: 83.5162421521224
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Autoregressive generative models are commonly used, especially for those
tasks involving sequential data. They have, however, been plagued by a slew of
inherent flaws due to the intrinsic characteristics of chain-style conditional
modeling (e.g., exposure bias or lack of long-range coherence), severely
limiting their ability to model distributions properly. In this paper, we
propose a unique method termed E-ARM for training autoregressive generative
models that takes advantage of a well-designed energy-based learning objective.
By leveraging the extra degree of freedom of the softmax operation, we are
allowed to make the autoregressive model itself be an energy-based model for
measuring the likelihood of input without introducing any extra parameters.
Furthermore, we show that E-ARM can be trained efficiently and is capable of
alleviating the exposure bias problem and increase temporal coherence for
autoregressive generative models. Extensive empirical results, covering
benchmarks like language modeling, neural machine translation, and image
generation, demonstrate the effectiveness of the proposed approach.
Related papers
- Energy-Based Diffusion Language Models for Text Generation [126.23425882687195]
Energy-based Diffusion Language Model (EDLM) is an energy-based model operating at the full sequence level for each diffusion step.
Our framework offers a 1.3$times$ sampling speedup over existing diffusion models.
arXiv Detail & Related papers (2024-10-28T17:25:56Z) - CAR: Controllable Autoregressive Modeling for Visual Generation [100.33455832783416]
Controllable AutoRegressive Modeling (CAR) is a novel, plug-and-play framework that integrates conditional control into multi-scale latent variable modeling.
CAR progressively refines and captures control representations, which are injected into each autoregressive step of the pre-trained model to guide the generation process.
Our approach demonstrates excellent controllability across various types of conditions and delivers higher image quality compared to previous methods.
arXiv Detail & Related papers (2024-10-07T00:55:42Z) - Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models [54.132297393662654]
We introduce a hybrid method that fine-tunes cutting-edge diffusion models by optimizing reward models through RL.
We demonstrate the capability of our approach to outperform the best designs in offline data, leveraging the extrapolation capabilities of reward models.
arXiv Detail & Related papers (2024-05-30T03:57:29Z) - Generative Marginalization Models [21.971818180264943]
marginalization models (MAMs) are a new family of generative models for high-dimensional discrete data.
They offer scalable and flexible generative modeling by explicitly modeling all induced marginal distributions.
For energy-based training tasks, MAMs enable any-order generative modeling of high-dimensional problems beyond the scale of previous methods.
arXiv Detail & Related papers (2023-10-19T17:14:29Z) - Exploring Model Transferability through the Lens of Potential Energy [78.60851825944212]
Transfer learning has become crucial in computer vision tasks due to the vast availability of pre-trained deep learning models.
Existing methods for measuring the transferability of pre-trained models rely on statistical correlations between encoded static features and task labels.
We present an insightful physics-inspired approach named PED to address these challenges.
arXiv Detail & Related papers (2023-08-29T07:15:57Z) - RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment [32.752633250862694]
Generative foundation models are susceptible to implicit biases that can arise from extensive unsupervised training data.
We introduce a new framework, Reward rAnked FineTuning, designed to align generative models effectively.
arXiv Detail & Related papers (2023-04-13T18:22:40Z) - Controllable and Compositional Generation with Latent-Space Energy-Based
Models [60.87740144816278]
Controllable generation is one of the key requirements for successful adoption of deep generative models in real-world applications.
In this work, we use energy-based models (EBMs) to handle compositional generation over a set of attributes.
By composing energy functions with logical operators, this work is the first to achieve such compositionality in generating photo-realistic images of resolution 1024x1024.
arXiv Detail & Related papers (2021-10-21T03:31:45Z) - DynamicEmbedding: Extending TensorFlow for Colossal-Scale Applications [0.0]
One of the limitations of deep learning models with sparse features today stems from the predefined nature of their input.
We show that the resulting models are able to perform better and efficiently run at a much larger scale.
arXiv Detail & Related papers (2020-04-17T17:43:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.