A theoretical perspective on mode collapse in variational inference
- URL: http://arxiv.org/abs/2410.13300v1
- Date: Thu, 17 Oct 2024 07:56:30 GMT
- Title: A theoretical perspective on mode collapse in variational inference
- Authors: Roman Soletskyi, Marylou GabriƩ, Bruno Loureiro,
- Abstract summary: We show that mode collapse is present even in statistically favorable scenarios, and identify two key mechanisms driving it: mean alignment and vanishing weight.
Our theoretical findings are consistent with the implementation of variational inference using normalizing flows, a class of popular generative models.
- Score: 8.74105235144778
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While deep learning has expanded the possibilities for highly expressive variational families, the practical benefits of these tools for variational inference (VI) are often limited by the minimization of the traditional Kullback-Leibler objective, which can yield suboptimal solutions. A major challenge in this context is \emph{mode collapse}: the phenomenon where a model concentrates on a few modes of the target distribution during training, despite being statistically capable of expressing them all. In this work, we carry a theoretical investigation of mode collapse for the gradient flow on Gaussian mixture models. We identify the key low-dimensional statistics characterizing the flow, and derive a closed set of low-dimensional equations governing their evolution. Leveraging this compact description, we show that mode collapse is present even in statistically favorable scenarios, and identify two key mechanisms driving it: mean alignment and vanishing weight. Our theoretical findings are consistent with the implementation of VI using normalizing flows, a class of popular generative models, thereby offering practical insights.
Related papers
- Annealing in variational inference mitigates mode collapse: A theoretical study on Gaussian mixtures [12.937511747845436]
We provide a mathematical analysis of strategies for mitigating mode collapse in a tractable setting.<n>Our analysis shows that an appropriately chosen annealing scheme can robustly prevent mode collapse.<n>We present numerical evidence that these theoretical tradeoffs qualitatively extend to neural network based models, RealNVP normalizing flows.
arXiv Detail & Related papers (2026-02-13T13:28:23Z) - Drift No More? Context Equilibria in Multi-Turn LLM Interactions [58.69551510148673]
contexts drift is the gradual divergence of a model's outputs from goal-consistent behavior across turns.<n>Unlike single-turn errors, drift unfolds temporally and is poorly captured by static evaluation metrics.<n>We show that multi-turn drift can be understood as a controllable equilibrium phenomenon rather than as inevitable decay.
arXiv Detail & Related papers (2025-10-09T04:48:49Z) - On the Closed-Form of Flow Matching: Generalization Does Not Arise from Target Stochasticity [9.174891098192951]
We investigate why diffusion and flow matching techniques generalize so effectively.<n>We show that in high-dimensional settings, the generalization and closed-form versions of the flow matching loss yield nearly equivalent losses.
arXiv Detail & Related papers (2025-06-04T08:50:32Z) - Pushing the Limits of Large Language Model Quantization via the Linearity Theorem [71.3332971315821]
We present a "line theoremarity" establishing a direct relationship between the layer-wise $ell$ reconstruction error and the model perplexity increase due to quantization.
This insight enables two novel applications: (1) a simple data-free LLM quantization method using Hadamard rotations and MSE-optimal grids, dubbed HIGGS, and (2) an optimal solution to the problem of finding non-uniform per-layer quantization levels.
arXiv Detail & Related papers (2024-11-26T15:35:44Z) - FFHFlow: A Flow-based Variational Approach for Learning Diverse Dexterous Grasps with Shape-Aware Introspection [19.308304984645684]
We introduce a novel model that can generate diverse grasps for a multi-fingered hand.
The proposed idea gains superior performance and higher run-time efficiency against strong baselines.
We also demonstrate substantial benefits of greater diversity for grasping objects in clutter and a confined workspace in the real world.
arXiv Detail & Related papers (2024-07-21T13:33:08Z) - Training Dynamics of Nonlinear Contrastive Learning Model in the High Dimensional Limit [1.7597525104451157]
An empirical distribution of the model weights converges to a deterministic measure governed by a McKean-Vlasov nonlinear partial differential equation (PDE)
Under L2 regularization, this PDE reduces to a closed set of low-dimensional ordinary differential equations (ODEs)
We analyze the fixed point locations and their stability of the ODEs unveiling several interesting findings.
arXiv Detail & Related papers (2024-06-11T03:07:41Z) - Unveil Conditional Diffusion Models with Classifier-free Guidance: A Sharp Statistical Theory [87.00653989457834]
Conditional diffusion models serve as the foundation of modern image synthesis and find extensive application in fields like computational biology and reinforcement learning.
Despite the empirical success, theory of conditional diffusion models is largely missing.
This paper bridges the gap by presenting a sharp statistical theory of distribution estimation using conditional diffusion models.
arXiv Detail & Related papers (2024-03-18T17:08:24Z) - Model Collapse Demystified: The Case of Regression [12.115359951879462]
We study the phenomenon of "model collapse" in the era of proliferation of large language and image generation models.
We obtain analytic formulae which quantitatively outline this phenomenon in a broad range of regimes.
We propose a simple strategy based on adaptive regularization to mitigate model collapse.
arXiv Detail & Related papers (2024-02-12T15:26:01Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Boosted Control Functions: Distribution generalization and invariance in confounded models [10.503777692702952]
We introduce a strong notion of invariance that allows for distribution generalization even in the presence of nonlinear, non-identifiable structural functions.
We propose the ControlTwicing algorithm to estimate the Boosted Control Function (BCF) using flexible machine-learning techniques.
arXiv Detail & Related papers (2023-10-09T15:43:46Z) - On the Embedding Collapse when Scaling up Recommendation Models [53.66285358088788]
We identify the embedding collapse phenomenon as the inhibition of scalability, wherein the embedding matrix tends to occupy a low-dimensional subspace.
We propose a simple yet effective multi-embedding design incorporating embedding-set-specific interaction modules to learn embedding sets with large diversity.
arXiv Detail & Related papers (2023-10-06T17:50:38Z) - Eliminating Lipschitz Singularities in Diffusion Models [51.806899946775076]
We show that diffusion models frequently exhibit the infinite Lipschitz near the zero point of timesteps.
This poses a threat to the stability and accuracy of the diffusion process, which relies on integral operations.
We propose a novel approach, dubbed E-TSDM, which eliminates the Lipschitz of the diffusion model near zero.
arXiv Detail & Related papers (2023-06-20T03:05:28Z) - Diffusion Policies as an Expressive Policy Class for Offline
Reinforcement Learning [70.20191211010847]
Offline reinforcement learning (RL) aims to learn an optimal policy using a previously collected static dataset.
We introduce Diffusion Q-learning (Diffusion-QL) that utilizes a conditional diffusion model to represent the policy.
We show that our method can achieve state-of-the-art performance on the majority of the D4RL benchmark tasks.
arXiv Detail & Related papers (2022-08-12T09:54:11Z) - Deep learning: a statistical viewpoint [120.94133818355645]
Deep learning has revealed some major surprises from a theoretical perspective.
In particular, simple gradient methods easily find near-perfect solutions to non-optimal training problems.
We conjecture that specific principles underlie these phenomena.
arXiv Detail & Related papers (2021-03-16T16:26:36Z) - Uses and Abuses of the Cross-Entropy Loss: Case Studies in Modern Deep
Learning [29.473503894240096]
We focus on the use of the categorical cross-entropy loss to model data that is not strictly categorical, but rather takes values on the simplex.
This practice is standard in neural network architectures with label smoothing and actor-mimic reinforcement learning, amongst others.
We propose probabilistically-inspired alternatives to these models, providing an approach that is more principled and theoretically appealing.
arXiv Detail & Related papers (2020-11-10T16:44:35Z) - Robust model training and generalisation with Studentising flows [22.757298187704745]
We discuss how these methods can be further improved based on insights from robust (in particular, resistant) statistics.
We propose to endow flow-based models with fat-tailed latent distributions as a simple drop-in replacement for the Gaussian distribution.
Experiments on several different datasets confirm the efficacy of the proposed approach.
arXiv Detail & Related papers (2020-06-11T16:47:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.