Related papers: Entropy-Reservoir Bregman Projection: An Information-Geometric Unification of Model Collapse

Entropy-Reservoir Bregman Projection: An Information-Geometric Unification of Model Collapse

URL: http://arxiv.org/abs/2512.14879v1
Date: Tue, 16 Dec 2025 19:50:03 GMT
Title: Entropy-Reservoir Bregman Projection: An Information-Geometric Unification of Model Collapse
Authors: Jingwei Chen,
Abstract summary: We present EntropyReser Bregman Projection- ERBP, an information-geometric framework that unifies these phenomena.<n>Our theory yields a necessary condition for collapse, (ii) a sufficient condition that guarantees a non-language entropy floor, and (iii) closed-form rates that depend on sample size.
Score: 3.533187668612022
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Self-referential learning -- training a model on data it generated itself -- promises boundless scalability but chronically suffers from model collapse: language models degenerate into repetitive text, GANs drop modes, and reinforcement-learning policies over-exploit. Although practitioners employ ad~hoc fixes such as real-data mixing, entropy bonuses, knowledge distillation, or retrieval-augmented generation, a single principle that explains both the failure mode and the success of these fixes has remained elusive. We present Entropy-Reservoir Bregman Projection (ERBP), an information-geometric framework that unifies these phenomena. We model the closed loop as a stochastic Bregman projection sequence in distribution space. Without external coupling, finite-sample noise forces the system to project onto an ever-shrinking empirical support, causing exponential entropy decay and eventual collapse. Introducing an Entropy Reservoir -- a high-entropy distribution mixed into each projection -- injects a controllable entropy flux that provably stabilises the dynamics. Our theory yields (i) a necessary condition for collapse, (ii) a sufficient condition that guarantees a non-trivial entropy floor, and (iii) closed-form rates that depend only on sample size and the strong-convexity/Lipschitz constants of the Bregman generator. Experiments on large-language-model self-training, Soft Actor-Critic in reinforcement learning, and GAN optimisation validate our predictions and show that disparate stabilisation heuristics correspond to specific reservoir choices and coupling coefficients. ERBP thus transforms a collection of folk remedies into a single, quantitative design rule: monitor and budget your entropy flux.

Related papers

Emergence of Distortions in High-Dimensional Guided Diffusion Models [11.774563966512707]
We formalize the phenomenon of generative distortion defined as the mismatch between the CFG-induced sampling and the true conditional distribution.<n>We show that standard CFG schedules are incapable of preventing variance shrinkage.<n>We propose a theoretically motivated guidance schedule featuring a negative-guidance window, which mitigates loss of diversity while preserving class separability.
arXiv Detail & Related papers (2026-01-31T13:19:45Z)
Entropy Production in Machine Learning Under Fokker-Planck Probability Flow [0.0]
We propose an entropy-based retraining framework grounded in non-equilibrium cost dynamics.<n>We show that entropy-triggered retraining achieves predictive performance comparable to high-frequency retraining.
arXiv Detail & Related papers (2026-01-02T04:01:57Z)
Chaos, Entanglement and Measurement: Field-Theoretic Perspectives on Quantum Information Dynamics [0.0]
I study scrambling and pseudorandomness in the Brownian Sachdev-Ye-Kitaev (SYK) model.<n>I construct a field theory for weakly measured SYK clusters.<n>I develop a strong-disorder renormalization group for measurement-only SYK clusters.
arXiv Detail & Related papers (2025-12-11T10:04:30Z)
On the Collapse of Generative Paths: A Criterion and Correction for Diffusion Steering [29.633206995806542]
In-time steering enables pretrained diffusion/flow models to be adapted to new tasks without retraining.<n>This construction harbors a critical and previously unformalized failure mode: Marginal Path Collapse.<n>We introduce Adaptive path Correction with Exponents (ACE), which extends Feynman-Kac steering to time-varying exponents.
arXiv Detail & Related papers (2025-12-11T06:44:08Z)
Closed-Loop Transformers: Autoregressive Modeling as Iterative Latent Equilibrium [0.6820746164515952]
We introduce the closed-loop prediction principle, which requires that models iteratively refine latent representations until reaching a self-consistent equilibrium.<n>We instantiate this principle as Equilibrium Transformers, which augment standard transformer layers with an Equilibrium Refinement Module.<n>Preliminary experiments on the binary parity task demonstrate +3.28% average improvement on challenging sequences, with gains reaching +8.07% where standard transformers approach random performance.
arXiv Detail & Related papers (2025-11-26T20:02:59Z)
Which Layer Causes Distribution Deviation? Entropy-Guided Adaptive Pruning for Diffusion and Flow Models [77.55829017952728]
EntPruner is an entropy-guided automatic progressive pruning framework for diffusion and flow models.<n>Experiments on DiT and SiT models demonstrate the effectiveness of EntPruner, achieving up to 2.22$times$ inference speedup.
arXiv Detail & Related papers (2025-11-26T07:20:48Z)
A Free Probabilistic Framework for Denoising Diffusion Models: Entropy, Transport, and Reverse Processes [22.56299060022639]
This paper builds on Voiculescu's theory of free entropy and free Fisher information.<n>We formulate diffusion and quantify reverse processes governed by operator-valued dynamics.<n>The resulting dynamics admit a gradient-flow structure in the noncommutative Wasserstein space.
arXiv Detail & Related papers (2025-10-26T18:03:54Z)
Convergence and Generalization of Anti-Regularization for Parametric Models [0.0]
Anti-regularization introduces a reward term with a reversed sign into the loss function.<n>We formalize spectral safety conditions and trust-region constraints.<n>We design a lightweight safeguard that combines a projection operator with gradient clipping to guarantee stable intervention.
arXiv Detail & Related papers (2025-08-24T15:34:17Z)
Overcoming Dimensional Factorization Limits in Discrete Diffusion Models through Quantum Joint Distribution Learning [79.65014491424151]
We propose a quantum Discrete Denoising Diffusion Probabilistic Model (QD3PM)<n>It enables joint probability learning through diffusion and denoising in exponentially large Hilbert spaces.<n>This paper establishes a new theoretical paradigm in generative models by leveraging the quantum advantage in joint distribution learning.
arXiv Detail & Related papers (2025-05-08T11:48:21Z)
One-for-More: Continual Diffusion Model for Anomaly Detection [63.50488826645681]
Anomaly detection methods utilize diffusion models to generate or reconstruct normal samples when given arbitrary anomaly images.<n>Our study found that the diffusion model suffers from severe faithfulness hallucination'' and catastrophic forgetting''<n>We propose a continual diffusion model that uses gradient projection to achieve stable continual learning.
arXiv Detail & Related papers (2025-02-27T07:47:27Z)
Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized Control [54.132297393662654]
Diffusion models excel at capturing complex data distributions, such as those of natural images and proteins. While diffusion models are trained to represent the distribution in the training dataset, we often are more concerned with other properties, such as the aesthetic quality of the generated images. We present theoretical and empirical evidence that demonstrates our framework is capable of efficiently generating diverse samples with high genuine rewards.
arXiv Detail & Related papers (2024-02-23T08:54:42Z)
GANs with Variational Entropy Regularizers: Applications in Mitigating the Mode-Collapse Issue [95.23775347605923]
Building on the success of deep learning, Generative Adversarial Networks (GANs) provide a modern approach to learn a probability distribution from observed samples. GANs often suffer from the mode collapse issue where the generator fails to capture all existing modes of the input distribution. We take an information-theoretic approach and maximize a variational lower bound on the entropy of the generated samples to increase their diversity.
arXiv Detail & Related papers (2020-09-24T19:34:37Z)
Preventing Posterior Collapse with Levenshtein Variational Autoencoder [61.30283661804425]
We propose to replace the evidence lower bound (ELBO) with a new objective which is simple to optimize and prevents posterior collapse. We show that Levenstein VAE produces more informative latent representations than alternative approaches to preventing posterior collapse.
arXiv Detail & Related papers (2020-04-30T13:27:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.