Related papers: Your Autoregressive Generative Model Can be Better If You Treat It as an Energy-Based One

Your Autoregressive Generative Model Can be Better If You Treat It as an Energy-Based One

URL: http://arxiv.org/abs/2206.12840v1
Date: Sun, 26 Jun 2022 10:58:41 GMT
Title: Your Autoregressive Generative Model Can be Better If You Treat It as an Energy-Based One
Authors: Yezhen Wang, Tong Che, Bo Li, Kaitao Song, Hengzhi Pei, Yoshua Bengio, Dongsheng Li
Abstract summary: We propose a unique method termed E-ARM for training autoregressive generative models. E-ARM takes advantage of a well-designed energy-based learning objective. We show that E-ARM can be trained efficiently and is capable of alleviating the exposure bias problem.
Score: 83.5162421521224
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autoregressive generative models are commonly used, especially for those tasks involving sequential data. They have, however, been plagued by a slew of inherent flaws due to the intrinsic characteristics of chain-style conditional modeling (e.g., exposure bias or lack of long-range coherence), severely limiting their ability to model distributions properly. In this paper, we propose a unique method termed E-ARM for training autoregressive generative models that takes advantage of a well-designed energy-based learning objective. By leveraging the extra degree of freedom of the softmax operation, we are allowed to make the autoregressive model itself be an energy-based model for measuring the likelihood of input without introducing any extra parameters. Furthermore, we show that E-ARM can be trained efficiently and is capable of alleviating the exposure bias problem and increase temporal coherence for autoregressive generative models. Extensive empirical results, covering benchmarks like language modeling, neural machine translation, and image generation, demonstrate the effectiveness of the proposed approach.

Related papers

Autoregressive Video Generation without Vector Quantization [90.87907377618747]
We reformulate the video generation problem as a non-quantized autoregressive modeling of temporal frame-by-frame prediction. With the proposed approach, we train a novel video autoregressive model without vector quantization, termed NOVA. Our results demonstrate that NOVA surpasses prior autoregressive video models in data efficiency, inference speed, visual fidelity, and video fluency, even with a much smaller model capacity.
arXiv Detail & Related papers (2024-12-18T18:59:53Z)
Energy-Based Diffusion Language Models for Text Generation [126.23425882687195]
Energy-based Diffusion Language Model (EDLM) is an energy-based model operating at the full sequence level for each diffusion step. Our framework offers a 1.3$times$ sampling speedup over existing diffusion models.
arXiv Detail & Related papers (2024-10-28T17:25:56Z)
CAR: Controllable Autoregressive Modeling for Visual Generation [100.33455832783416]
Controllable AutoRegressive Modeling (CAR) is a novel, plug-and-play framework that integrates conditional control into multi-scale latent variable modeling. CAR progressively refines and captures control representations, which are injected into each autoregressive step of the pre-trained model to guide the generation process. Our approach demonstrates excellent controllability across various types of conditions and delivers higher image quality compared to previous methods.
arXiv Detail & Related papers (2024-10-07T00:55:42Z)
Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models [54.132297393662654]
We introduce a hybrid method that fine-tunes cutting-edge diffusion models by optimizing reward models through RL. We demonstrate the capability of our approach to outperform the best designs in offline data, leveraging the extrapolation capabilities of reward models.
arXiv Detail & Related papers (2024-05-30T03:57:29Z)
Generative Marginalization Models [21.971818180264943]
marginalization models (MAMs) are a new family of generative models for high-dimensional discrete data. They offer scalable and flexible generative modeling by explicitly modeling all induced marginal distributions. For energy-based training tasks, MAMs enable any-order generative modeling of high-dimensional problems beyond the scale of previous methods.
arXiv Detail & Related papers (2023-10-19T17:14:29Z)
Exploring Model Transferability through the Lens of Potential Energy [78.60851825944212]
Transfer learning has become crucial in computer vision tasks due to the vast availability of pre-trained deep learning models. Existing methods for measuring the transferability of pre-trained models rely on statistical correlations between encoded static features and task labels. We present an insightful physics-inspired approach named PED to address these challenges.
arXiv Detail & Related papers (2023-08-29T07:15:57Z)
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment [32.752633250862694]
Generative foundation models are susceptible to implicit biases that can arise from extensive unsupervised training data. We introduce a new framework, Reward rAnked FineTuning, designed to align generative models effectively.
arXiv Detail & Related papers (2023-04-13T18:22:40Z)
Controllable and Compositional Generation with Latent-Space Energy-Based Models [60.87740144816278]
Controllable generation is one of the key requirements for successful adoption of deep generative models in real-world applications. In this work, we use energy-based models (EBMs) to handle compositional generation over a set of attributes. By composing energy functions with logical operators, this work is the first to achieve such compositionality in generating photo-realistic images of resolution 1024x1024.
arXiv Detail & Related papers (2021-10-21T03:31:45Z)
DynamicEmbedding: Extending TensorFlow for Colossal-Scale Applications [0.0]
One of the limitations of deep learning models with sparse features today stems from the predefined nature of their input. We show that the resulting models are able to perform better and efficiently run at a much larger scale.
arXiv Detail & Related papers (2020-04-17T17:43:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.