Mutual Information Collapse Explains Disentanglement Failure in $β$-VAEs
- URL: http://arxiv.org/abs/2602.09277v1
- Date: Mon, 09 Feb 2026 23:38:11 GMT
- Title: Mutual Information Collapse Explains Disentanglement Failure in $β$-VAEs
- Authors: Minh Vu, Xiaoliang Wan, Shuangqing Wei,
- Abstract summary: $$-VAE is a framework for unsupervised disentanglement.<n> benchmarks such as MIG and SAP typically peak at intermediate $$ and collapse as regularization increases.<n>We introduce the $$-VAE, which decouples regularization pressure from informational collapse.
- Score: 4.155522769716163
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The $β$-VAE is a foundational framework for unsupervised disentanglement, using $β$ to regulate the trade-off between latent factorization and reconstruction fidelity. Empirically, however, disentanglement performance exhibits a pervasive non-monotonic trend: benchmarks such as MIG and SAP typically peak at intermediate $β$ and collapse as regularization increases. We demonstrate that this collapse is a fundamental information-theoretic failure, where strong Kullback-Leibler pressure promotes marginal independence at the expense of the latent channel's semantic informativeness. By formalizing this mechanism in a linear-Gaussian setting, we prove that for $β> 1$, stationarity-induced dynamics trigger a spectral contraction of the encoder gain, driving latent-factor mutual information to zero. To resolve this, we introduce the $λβ$-VAE, which decouples regularization pressure from informational collapse via an auxiliary $L_2$ reconstruction penalty $λ$. Extensive experiments on dSprites, Shapes3D, and MPI3D-real confirm that $λ> 0$ stabilizes disentanglement and restores latent informativeness over a significantly broader range of $β$, providing a principled theoretical justification for dual-parameter regularization in variational inference backbones.
Related papers
- Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs [55.77845440440496]
Push-based decentralized communication enables optimization over communication networks, where information exchange may be asymmetric.<n>We develop a unified uniform-stability framework for the Gradient Push (SGP) algorithm.<n>A key technical ingredient is an imbalance-aware generalization bound through two quantities.
arXiv Detail & Related papers (2026-02-24T05:32:03Z) - A Regularization-Sharpness Tradeoff for Linear Interpolators [8.628516727959259]
We propose a regularization-sharpness tradeoff for over parameterized linear regression with an $ellp$ penalty.<n>Inspired by the interpolating information criterion, our framework decomposes the selection penalty into a regularization term.<n>Building on prior analyses that established this information criterion for ridge regularizers, this work first provides a general expression of the interpolating information criterion for $ellp$ regularizers.
arXiv Detail & Related papers (2026-02-13T07:21:08Z) - Information Fidelity in Tool-Using LLM Agents: A Martingale Analysis of the Model Context Protocol [69.11739400975445]
We introduce the first theoretical framework for analyzing error accumulation in Model Context Protocol (MCP) agents.<n>We show that cumulative distortion exhibits linear growth and high-probability deviations bounded by $O(sqrtT)$.<n>Key findings include: semantic weighting reduces distortion by 80%, and periodic re-grounding approximately every 9 steps suffices for error control.
arXiv Detail & Related papers (2026-02-10T21:08:53Z) - A Theoretical and Empirical Taxonomy of Imbalance in Binary Classification [0.0]
We show how imbalance shifts the discriminant boundary, yielding a deterioration slope that predicts four regimes: Normal, Mild, Extreme, and Catastrophic.<n>Across parametric and non-parametric models, empirical degradation closely follows theoretical predictions.<n>These results show that the triplet $(,,)$ provides a model-agnostic, geometrically grounded explanation of imbalance-induced deterioration.
arXiv Detail & Related papers (2026-01-07T18:02:11Z) - The Procrustean Bed of Time Series: The Optimization Bias of Point-wise Loss [53.542743390809356]
This paper aims to provide a first-principles analysis of the Expectation of Optimization Bias (EOB)<n>Our analysis reveals a fundamental paradigm paradox: the more deterministic and structured the time series, the more severe the bias by point-wise loss function.<n>We present a concrete solution that simultaneously achieves both principles via DFT or DWT.
arXiv Detail & Related papers (2025-12-21T06:08:22Z) - Understanding Robust Machine Learning for Nonparametric Regression with Heavy-Tailed Noise [10.844819221753042]
We use Huber regression as a close-up example within Tikhonov-regularized risk minimization.<n>We address two central challenges: (i) the breakdown of standard concentration tools under weak moment assumptions, and (ii) the analytical difficulties introduced by unbounded hypothesis spaces.<n>Our study delivers principled rules, extends beyond Huber to other robust losses, and highlights prediction error, not excess risk, as the fundamental lens for analyzing robust learning.
arXiv Detail & Related papers (2025-10-10T21:57:18Z) - Noise-induced decoherence-free zones for anyons [0.0]
We develop a framework for anyonic systems in which the exchange phase is promoted from a fixed parameter to a fluctuating quantity.<n>We show that the protected mode always minimizes its dephasing at $thetastar = pi/2$, independent of the specific form of $D$.<n>This highlights a simple design rule for optimizing coherence in noisy anyonic systems.
arXiv Detail & Related papers (2025-10-07T16:21:57Z) - FedSVD: Adaptive Orthogonalization for Private Federated Learning with LoRA [68.44043212834204]
Low-Rank Adaptation (LoRA) is widely used for efficient fine-tuning of language models in learning (FL)<n>Low-Rank Adaptation (LoRA) is widely used for efficient fine-tuning of language models in learning (FL)
arXiv Detail & Related papers (2025-05-19T07:32:56Z) - Equivalence of the Empirical Risk Minimization to Regularization on the Family of f-Divergences [45.935798913942904]
The solution to empirical risk minimization with $f$-divergence regularization (ERM-$f$DR) is presented.
Examples of the solution for particular choices of the function $f$ are presented.
arXiv Detail & Related papers (2024-02-01T11:12:00Z) - A Robustness Analysis of Blind Source Separation [91.3755431537592]
Blind source separation (BSS) aims to recover an unobserved signal from its mixture $X=f(S)$ under the condition that the transformation $f$ is invertible but unknown.
We present a general framework for analysing such violations and quantifying their impact on the blind recovery of $S$ from $X$.
We show that a generic BSS-solution in response to general deviations from its defining structural assumptions can be profitably analysed in the form of explicit continuity guarantees.
arXiv Detail & Related papers (2023-03-17T16:30:51Z) - A Relational Intervention Approach for Unsupervised Dynamics
Generalization in Model-Based Reinforcement Learning [113.75991721607174]
We introduce an interventional prediction module to estimate the probability of two estimated $hatz_i, hatz_j$ belonging to the same environment.
We empirically show that $hatZ$ estimated by our method enjoy less redundant information than previous methods.
arXiv Detail & Related papers (2022-06-09T15:01:36Z) - DynamicVAE: Decoupling Reconstruction Error and Disentangled
Representation Learning [15.317044259237043]
This paper challenges the common assumption that the weight $beta$, in $beta$-VAE, should be larger than $1$ in order to effectively disentangle latent factors.
We demonstrate that $beta$-VAE, with $beta 1$, can not only attain good disentanglement but also significantly improve reconstruction accuracy via dynamic control.
arXiv Detail & Related papers (2020-09-15T00:01:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.