ACDC: Autoregressive Coherent Multimodal Generation using Diffusion Correction
- URL: http://arxiv.org/abs/2410.04721v1
- Date: Mon, 7 Oct 2024 03:22:51 GMT
- Title: ACDC: Autoregressive Coherent Multimodal Generation using Diffusion Correction
- Authors: Hyungjin Chung, Dohun Lee, Jong Chul Ye,
- Abstract summary: Autoregressive models (ARMs) and diffusion models (DMs) represent two leading paradigms in generative modeling.
We introduce Autoregressive Coherent multimodal generation with Diffusion Correction (ACDC)
ACDC combines the strengths of both ARMs and DMs at the inference stage without the need for additional fine-tuning.
- Score: 55.03585818289934
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Autoregressive models (ARMs) and diffusion models (DMs) represent two leading paradigms in generative modeling, each excelling in distinct areas: ARMs in global context modeling and long-sequence generation, and DMs in generating high-quality local contexts, especially for continuous data such as images and short videos. However, ARMs often suffer from exponential error accumulation over long sequences, leading to physically implausible results, while DMs are limited by their local context generation capabilities. In this work, we introduce Autoregressive Coherent multimodal generation with Diffusion Correction (ACDC), a zero-shot approach that combines the strengths of both ARMs and DMs at the inference stage without the need for additional fine-tuning. ACDC leverages ARMs for global context generation and memory-conditioned DMs for local correction, ensuring high-quality outputs by correcting artifacts in generated multimodal tokens. In particular, we propose a memory module based on large language models (LLMs) that dynamically adjusts the conditioning texts for the DMs, preserving crucial global context information. Our experiments on multimodal tasks, including coherent multi-frame story generation and autoregressive video generation, demonstrate that ACDC effectively mitigates the accumulation of errors and significantly enhances the quality of generated outputs, achieving superior performance while remaining agnostic to specific ARM and DM architectures. Project page: https://acdc2025.github.io/
Related papers
- Multimodal Latent Language Modeling with Next-Token Diffusion [111.93906046452125]
Multimodal generative models require a unified approach to handle both discrete data (e.g., text and code) and continuous data (e.g., image, audio, video)
We propose Latent Language Modeling (LatentLM), which seamlessly integrates continuous and discrete data using causal Transformers.
arXiv Detail & Related papers (2024-12-11T18:57:32Z) - Scaling up Masked Diffusion Models on Text [43.16800764711572]
Masked diffusion models (MDMs) have shown promise in language modeling.
This paper establishes the first scaling law for MDMs.
We train a family of MDMs with up to 1.1 billion (B) parameters to evaluate their performance against larger sizes.
arXiv Detail & Related papers (2024-10-24T08:01:22Z) - MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling [64.09238330331195]
We propose a novel Multi-Modal Auto-Regressive (MMAR) probabilistic modeling framework.
Unlike discretization line of method, MMAR takes in continuous-valued image tokens to avoid information loss.
We show that MMAR demonstrates much more superior performance than other joint multi-modal models.
arXiv Detail & Related papers (2024-10-14T17:57:18Z) - Efficient Distribution Matching of Representations via Noise-Injected Deep InfoMax [73.03684002513218]
We enhance Deep InfoMax (DIM) to enable automatic matching of learned representations to a selected prior distribution.
We show that such modification allows for learning uniformly and normally distributed representations.
The results indicate a moderate trade-off between the performance on the downstream tasks and quality of DM.
arXiv Detail & Related papers (2024-10-09T15:40:04Z) - CCDM: Continuous Conditional Diffusion Models for Image Generation [22.70942688582302]
Continuous Conditional Generative Modeling (CCGM) aims to estimate the distribution of high-dimensional data, typically images, conditioned on scalar continuous variables.
Existing Conditional Adversarial Networks (CcGANs) were initially designed for this task, their adversarial training mechanism remains vulnerable to extremely sparse or imbalanced data.
To enhance the quality of generated images, a promising alternative is to replace CcGANs with Conditional Diffusion Models (CDMs)
arXiv Detail & Related papers (2024-05-06T15:10:19Z) - Enhancing Pre-Trained Generative Language Models with Question Attended Span Extraction on Machine Reading Comprehension [6.602323571343169]
Integrated during the fine-tuning phase of pre-trained generative language models (PLMs), QASE significantly enhances their performance.
The efficacy of the QASE module has been rigorously tested across various datasets.
arXiv Detail & Related papers (2024-04-27T19:42:51Z) - Unified Generation, Reconstruction, and Representation: Generalized Diffusion with Adaptive Latent Encoding-Decoding [90.77521413857448]
Deep generative models are anchored in three core capabilities -- generating new instances, reconstructing inputs, and learning compact representations.
We introduce Generalized generative adversarial-Decoding Diffusion Probabilistic Models (EDDPMs)
EDDPMs generalize the Gaussian noising-denoising in standard diffusion by introducing parameterized encoding-decoding.
Experiments on text, proteins, and images demonstrate the flexibility to handle diverse data and tasks.
arXiv Detail & Related papers (2024-02-29T10:08:57Z) - Extrapolating Multilingual Understanding Models as Multilingual
Generators [82.1355802012414]
This paper explores methods to empower multilingual understanding models the generation abilities to get a unified model.
We propose a textbfSemantic-textbfGuided textbfAlignment-then-Denoising (SGA) approach to adapt an encoder to a multilingual generator with a small number of new parameters.
arXiv Detail & Related papers (2023-05-22T15:33:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.