Related papers: Mutual Information Collapse Explains Disentanglement Failure in $β$-VAEs

Mutual Information Collapse Explains Disentanglement Failure in $β$-VAEs

URL: http://arxiv.org/abs/2602.09277v1
Date: Mon, 09 Feb 2026 23:38:11 GMT
Title: Mutual Information Collapse Explains Disentanglement Failure in $β$-VAEs
Authors: Minh Vu, Xiaoliang Wan, Shuangqing Wei,
Abstract summary: $$-VAE is a framework for unsupervised disentanglement.<n> benchmarks such as MIG and SAP typically peak at intermediate $$ and collapse as regularization increases.<n>We introduce the $$-VAE, which decouples regularization pressure from informational collapse.
Score: 4.155522769716163
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The $β$-VAE is a foundational framework for unsupervised disentanglement, using $β$ to regulate the trade-off between latent factorization and reconstruction fidelity. Empirically, however, disentanglement performance exhibits a pervasive non-monotonic trend: benchmarks such as MIG and SAP typically peak at intermediate $β$ and collapse as regularization increases. We demonstrate that this collapse is a fundamental information-theoretic failure, where strong Kullback-Leibler pressure promotes marginal independence at the expense of the latent channel's semantic informativeness. By formalizing this mechanism in a linear-Gaussian setting, we prove that for $β> 1$, stationarity-induced dynamics trigger a spectral contraction of the encoder gain, driving latent-factor mutual information to zero. To resolve this, we introduce the $λβ$-VAE, which decouples regularization pressure from informational collapse via an auxiliary $L_2$ reconstruction penalty $λ$. Extensive experiments on dSprites, Shapes3D, and MPI3D-real confirm that $λ> 0$ stabilizes disentanglement and restores latent informativeness over a significantly broader range of $β$, providing a principled theoretical justification for dual-parameter regularization in variational inference backbones.

Related papers

Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs [55.77845440440496]
Push-based decentralized communication enables optimization over communication networks, where information exchange may be asymmetric.<n>We develop a unified uniform-stability framework for the Gradient Push (SGP) algorithm.<n>A key technical ingredient is an imbalance-aware generalization bound through two quantities.
arXiv Detail & Related papers (2026-02-24T05:32:03Z)
A Regularization-Sharpness Tradeoff for Linear Interpolators [8.628516727959259]
We propose a regularization-sharpness tradeoff for over parameterized linear regression with an $ellp$ penalty.<n>Inspired by the interpolating information criterion, our framework decomposes the selection penalty into a regularization term.<n>Building on prior analyses that established this information criterion for ridge regularizers, this work first provides a general expression of the interpolating information criterion for $ellp$ regularizers.
arXiv Detail & Related papers (2026-02-13T07:21:08Z)
Information Fidelity in Tool-Using LLM Agents: A Martingale Analysis of the Model Context Protocol [69.11739400975445]
We introduce the first theoretical framework for analyzing error accumulation in Model Context Protocol (MCP) agents.<n>We show that cumulative distortion exhibits linear growth and high-probability deviations bounded by $O(sqrtT)$.<n>Key findings include: semantic weighting reduces distortion by 80%, and periodic re-grounding approximately every 9 steps suffices for error control.
arXiv Detail & Related papers (2026-02-10T21:08:53Z)
A Theoretical and Empirical Taxonomy of Imbalance in Binary Classification [0.0]
We show how imbalance shifts the discriminant boundary, yielding a deterioration slope that predicts four regimes: Normal, Mild, Extreme, and Catastrophic.<n>Across parametric and non-parametric models, empirical degradation closely follows theoretical predictions.<n>These results show that the triplet $(,,)$ provides a model-agnostic, geometrically grounded explanation of imbalance-induced deterioration.
arXiv Detail & Related papers (2026-01-07T18:02:11Z)
The Procrustean Bed of Time Series: The Optimization Bias of Point-wise Loss [53.542743390809356]
This paper aims to provide a first-principles analysis of the Expectation of Optimization Bias (EOB)<n>Our analysis reveals a fundamental paradigm paradox: the more deterministic and structured the time series, the more severe the bias by point-wise loss function.<n>We present a concrete solution that simultaneously achieves both principles via DFT or DWT.
arXiv Detail & Related papers (2025-12-21T06:08:22Z)
Understanding Robust Machine Learning for Nonparametric Regression with Heavy-Tailed Noise [10.844819221753042]
We use Huber regression as a close-up example within Tikhonov-regularized risk minimization.<n>We address two central challenges: (i) the breakdown of standard concentration tools under weak moment assumptions, and (ii) the analytical difficulties introduced by unbounded hypothesis spaces.<n>Our study delivers principled rules, extends beyond Huber to other robust losses, and highlights prediction error, not excess risk, as the fundamental lens for analyzing robust learning.
arXiv Detail & Related papers (2025-10-10T21:57:18Z)
Noise-induced decoherence-free zones for anyons [0.0]
We develop a framework for anyonic systems in which the exchange phase is promoted from a fixed parameter to a fluctuating quantity.<n>We show that the protected mode always minimizes its dephasing at $thetastar = pi/2$, independent of the specific form of $D$.<n>This highlights a simple design rule for optimizing coherence in noisy anyonic systems.
arXiv Detail & Related papers (2025-10-07T16:21:57Z)
FedSVD: Adaptive Orthogonalization for Private Federated Learning with LoRA [68.44043212834204]
Low-Rank Adaptation (LoRA) is widely used for efficient fine-tuning of language models in learning (FL)<n>Low-Rank Adaptation (LoRA) is widely used for efficient fine-tuning of language models in learning (FL)
arXiv Detail & Related papers (2025-05-19T07:32:56Z)
Equivalence of the Empirical Risk Minimization to Regularization on the Family of f-Divergences [45.935798913942904]
The solution to empirical risk minimization with $f$-divergence regularization (ERM-$f$DR) is presented. Examples of the solution for particular choices of the function $f$ are presented.
arXiv Detail & Related papers (2024-02-01T11:12:00Z)
A Robustness Analysis of Blind Source Separation [91.3755431537592]
Blind source separation (BSS) aims to recover an unobserved signal from its mixture $X=f(S)$ under the condition that the transformation $f$ is invertible but unknown. We present a general framework for analysing such violations and quantifying their impact on the blind recovery of $S$ from $X$. We show that a generic BSS-solution in response to general deviations from its defining structural assumptions can be profitably analysed in the form of explicit continuity guarantees.
arXiv Detail & Related papers (2023-03-17T16:30:51Z)
A Relational Intervention Approach for Unsupervised Dynamics Generalization in Model-Based Reinforcement Learning [113.75991721607174]
We introduce an interventional prediction module to estimate the probability of two estimated $hatz_i, hatz_j$ belonging to the same environment. We empirically show that $hatZ$ estimated by our method enjoy less redundant information than previous methods.
arXiv Detail & Related papers (2022-06-09T15:01:36Z)
DynamicVAE: Decoupling Reconstruction Error and Disentangled Representation Learning [15.317044259237043]
This paper challenges the common assumption that the weight $beta$, in $beta$-VAE, should be larger than $1$ in order to effectively disentangle latent factors. We demonstrate that $beta$-VAE, with $beta 1$, can not only attain good disentanglement but also significantly improve reconstruction accuracy via dynamic control.
arXiv Detail & Related papers (2020-09-15T00:01:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.