Related papers: Deconstructing Generative Diversity: An Information Bottleneck Analysis of Discrete Latent Generative Models

Deconstructing Generative Diversity: An Information Bottleneck Analysis of Discrete Latent Generative Models

URL: http://arxiv.org/abs/2512.01831v1
Date: Mon, 01 Dec 2025 16:13:23 GMT
Title: Deconstructing Generative Diversity: An Information Bottleneck Analysis of Discrete Latent Generative Models
Authors: Yudi Wu, Wenhao Zhao, Dianbo Liu,
Abstract summary: Generative diversity varies significantly across discrete latent generative models such as AR, MIM, and Diffusion.<n>We propose a diagnostic framework, grounded in Information Bottleneck (IB) theory, to analyze the underlying strategies resolving this behavior.
Score: 4.138804085040435
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative diversity varies significantly across discrete latent generative models such as AR, MIM, and Diffusion. We propose a diagnostic framework, grounded in Information Bottleneck (IB) theory, to analyze the underlying strategies resolving this behavior. The framework models generation as a conflict between a 'Compression Pressure' - a drive to minimize overall codebook entropy - and a 'Diversity Pressure' - a drive to maximize conditional entropy given an input. We further decompose this diversity into two primary sources: 'Path Diversity', representing the choice of high-level generative strategies, and 'Execution Diversity', the randomness in executing a chosen strategy. To make this decomposition operational, we introduce three zero-shot, inference-time interventions that directly perturb the latent generative process and reveal how models allocate and express diversity. Application of this probe-based framework to representative AR, MIM, and Diffusion systems reveals three distinct strategies: "Diversity-Prioritized" (MIM), "Compression-Prioritized" (AR), and "Decoupled" (Diffusion). Our analysis provides a principled explanation for their behavioral differences and informs a novel inference-time diversity enhancement technique.

Related papers

The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving [57.652356955571065]
State-of-the-art large language model (LLM) pipelines rely on bootstrapped reasoning loops.<n>We analyze how this design choice is sensitive to the collapse of the model's distribution over reasoning paths.<n>We introduce Distributional Creative Reasoning (DCR), a unified variational objective that casts training as gradient flow through probability measures on solution traces.
arXiv Detail & Related papers (2026-01-02T17:10:31Z)
Explainable Multimodal Regression via Information Decomposition [27.157278306251772]
We propose a novel multimodal regression framework grounded in Partial Information Decomposition (PID)<n>Our framework outperforms state-of-the-art methods in both predictive accuracy and interpretability, while also enabling informed modality selection for efficient inference.
arXiv Detail & Related papers (2025-12-26T18:07:18Z)
DiverseAR: Boosting Diversity in Bitwise Autoregressive Image Generation [22.400053095939402]
We introduce DiverseAR, a principled and effective method that enhances image diversity without sacrificing visual quality.<n>Specifically, we introduce an adaptive logits distribution scaling mechanism that dynamically adjusts the sharpness of the binary output distribution during sampling.<n>To mitigate potential fidelity loss caused by distribution smoothing, we develop an energy-based generation path search algorithm that avoids sampling low-confidence tokens.
arXiv Detail & Related papers (2025-12-02T16:54:36Z)
Variational Learning of Disentangled Representations [2.3713407563738063]
Disentangled representations enable models to separate factors of variation that are shared across experimental conditions from those that are condition-specific.<n>We introduce DISCoVeR, a new variational framework that explicitly separates condition-invariant and condition-specific factors.<n>We show that DISCoVeR achieves improved disentanglement on synthetic datasets, natural images, and single-cell RNA-seq data.
arXiv Detail & Related papers (2025-06-20T17:36:12Z)
GUD: Generation with Unified Diffusion [40.64742332352373]
Diffusion generative models transform noise into data by inverting a process that progressively adds noise to data samples. We develop a unified framework for diffusion generative models with greatly enhanced design freedom.
arXiv Detail & Related papers (2024-10-03T16:51:14Z)
Controlling the Fidelity and Diversity of Deep Generative Models via Pseudo Density [70.14884528360199]
We introduce an approach to bias deep generative models, such as GANs and diffusion models, towards generating data with enhanced fidelity or increased diversity. Our approach involves manipulating the distribution of training and generated data through a novel metric for individual samples, named pseudo density.
arXiv Detail & Related papers (2024-07-11T16:46:04Z)
Few-Shot Image Generation by Conditional Relaxing Diffusion Inversion [37.18537753482751]
Conditional Diffusion Relaxing Inversion (CRDI) is designed to enhance distribution diversity in synthetic image generation. CRDI does not rely on fine-tuning based on only a few samples. It focuses on reconstructing each target image instance and expanding diversity through few-shot learning.
arXiv Detail & Related papers (2024-07-09T21:58:26Z)
PGODE: Towards High-quality System Dynamics Modeling [40.76121531452706]
This paper studies the problem of modeling multi-agent dynamical systems, where agents could interact mutually to influence their behaviors. Recent research predominantly uses geometric graphs to depict these mutual interactions, which are then captured by graph neural networks (GNNs) We propose a new approach named Prototypical Graph ODE to address the problem.
arXiv Detail & Related papers (2023-11-11T12:04:47Z)
Strategic Distribution Shift of Interacting Agents via Coupled Gradient Flows [6.064702468344376]
We propose a novel framework for analyzing the dynamics of distribution shift in real-world systems. We show that our approach captures well-documented forms of distribution shifts like polarization and disparate impacts that simpler models cannot capture.
arXiv Detail & Related papers (2023-07-03T17:18:50Z)
Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement [53.2171981279647]
We present a framework that encapsulates both the VP- and variance-exploding (VE)-based diffusion methods. To improve performance and ease model training, we analyze the common difficulties encountered in diffusion models. We evaluate our model against several methods using a public benchmark to showcase the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-14T14:22:22Z)
Source-free Domain Adaptation Requires Penalized Diversity [60.04618512479438]
Source-free domain adaptation (SFDA) was introduced to address knowledge transfer between different domains in the absence of source data. In unsupervised SFDA, the diversity is limited to learning a single hypothesis on the source or learning multiple hypotheses with a shared feature extractor. We propose a novel unsupervised SFDA algorithm that promotes representational diversity through the use of separate feature extractors.
arXiv Detail & Related papers (2023-04-06T00:20:19Z)
Quantifying & Modeling Multimodal Interactions: An Information Decomposition Framework [89.8609061423685]
We propose an information-theoretic approach to quantify the degree of redundancy, uniqueness, and synergy relating input modalities with an output task. To validate PID estimation, we conduct extensive experiments on both synthetic datasets where the PID is known and on large-scale multimodal benchmarks. We demonstrate their usefulness in (1) quantifying interactions within multimodal datasets, (2) quantifying interactions captured by multimodal models, (3) principled approaches for model selection, and (4) three real-world case studies.
arXiv Detail & Related papers (2023-02-23T18:59:05Z)
Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective [72.55093886515824]
We introduce a causal formalism of motion forecasting, which casts the problem as a dynamic process with three groups of latent variables. We devise a modular architecture that factorizes the representations of invariant mechanisms and style confounders to approximate a causal graph. Experiment results on synthetic and real datasets show that our three proposed components significantly improve the robustness and reusability of the learned motion representations.
arXiv Detail & Related papers (2021-11-29T18:59:09Z)
Regularizing Variational Autoencoder with Diversity and Uncertainty Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference. We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.