Avoiding Premature Collapse: Adaptive Annealing for Entropy-Regularized Structural Inference
- URL: http://arxiv.org/abs/2601.23039v3
- Date: Wed, 04 Feb 2026 19:27:01 GMT
- Title: Avoiding Premature Collapse: Adaptive Annealing for Entropy-Regularized Structural Inference
- Authors: Yizhi Liu,
- Abstract summary: We identify a fundamental mechanism for this failure: textbfPremature Mode Collapse.<n>We propose textbfEfficient Piecewise Hybrid Adaptive Stability Control (EPH-ASC), an adaptive scheduling algorithm that monitors the stability of the inference process.
- Score: 1.7523718031184992
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Differentiable matching layers and residual connection paradigms, often implemented via entropy-regularized Optimal Transport (OT), serve as critical mechanisms in structural prediction and architectural scaling. However, recovering discrete permutations or maintaining identity mappings via annealing $ε\to 0$ is notoriously unstable. In this work, we identify a fundamental mechanism for this failure: \textbf{Premature Mode Collapse}. By analyzing the non-normal dynamics of the Sinkhorn fixed-point map, we reveal a theoretical thermodynamic speed limit: standard exponential cooling outpaces the contraction rate of the inference operator, which degrades as $O(1/ε)$. To address this, we propose \textbf{Efficient Piecewise Hybrid Adaptive Stability Control (EPH-ASC)}, an adaptive scheduling algorithm that monitors the stability of the inference process. We demonstrate that EPH-ASC is essential for stabilizing Manifold-Constrained Hyper-Connections (mHC) during large-scale training on the FineWeb-Edu dataset, effectively preventing late-stage gradient explosions by enforcing a linear stability law.
Related papers
- Physics-informed post-processing of stabilized finite element solutions for transient convection-dominated problems [0.0]
This work presents a hybrid computational framework that extends the PINN-Augmented SUPG with Shock-Capturing (PASSC) methodology.<n>The approach combines a semi-discrete finite stabilized method with a PINN element correction strategy for transient convection-diffusion-based equations.
arXiv Detail & Related papers (2026-03-03T18:51:17Z) - Entropy-Controlled Flow Matching [0.08460698440162889]
We propose a constrained variational principle over continuity-equation paths enforcing a global entropy-rate budget d/dt H(mu_t) >= -lambda.<n>We obtain certificate-style mode-coverage and density-floor guarantees with Lipschitz, and construct near-optimal counterexamples for unconstrained flow matching.
arXiv Detail & Related papers (2026-02-25T06:07:01Z) - Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs [55.77845440440496]
Push-based decentralized communication enables optimization over communication networks, where information exchange may be asymmetric.<n>We develop a unified uniform-stability framework for the Gradient Push (SGP) algorithm.<n>A key technical ingredient is an imbalance-aware generalization bound through two quantities.
arXiv Detail & Related papers (2026-02-24T05:32:03Z) - KoopGen: Koopman Generator Networks for Representing and Predicting Dynamical Systems with Continuous Spectra [65.11254608352982]
We introduce a generator-based neural Koopman framework that models dynamics through a structured, state-dependent representation of Koopman generators.<n>By exploiting the intrinsic Cartesian decomposition into skew-adjoint and self-adjoint components, KoopGen separates conservative transport from irreversible dissipation.
arXiv Detail & Related papers (2026-02-15T06:32:23Z) - Generalizing GNNs with Tokenized Mixture of Experts [75.8310720413187]
We show that improving stability requires reducing reliance on shift-sensitive features, leaving an irreducible worst-case generalization floor.<n>We propose STEM-GNN, a pretrain-then-finetune framework with a mixture-of-experts encoder for diverse computation paths.<n>Across nine node, link, and graph benchmarks, STEM-GNN achieves a stronger three-way balance, improving robustness to degree/homophily shifts and to feature/edge corruptions while remaining competitive on clean graphs.
arXiv Detail & Related papers (2026-02-09T22:48:30Z) - Memory-Conditioned Flow-Matching for Stable Autoregressive PDE Rollouts [0.0]
Autoregressive generative PDE solvers can be accurate one step ahead yet drift over long rollouts.<n>We show that eliminating unresolved variables yields an exact resolved evolution with a Markov term.<n>We then derive discrete Grnwall rollout bounds that separate memory approximation from conditional generation error.
arXiv Detail & Related papers (2026-02-06T13:21:52Z) - Dissipative Learning: A Framework for Viable Adaptive Systems [0.6345523830122167]
We introduce the BEDS (Bayesian Emergent Dissipative Structures) framework, modeling learning as the evolution of compressed belief states under dissipation constraints.<n>A central contribution is the Optimality Theorem, showing that Fisher-Rao regularization measuring change via information divergence rather than Euclidean distance is the unique thermodynamically optimal regularization strategy.
arXiv Detail & Related papers (2026-01-25T18:10:15Z) - The Procrustean Bed of Time Series: The Optimization Bias of Point-wise Loss [53.542743390809356]
This paper aims to provide a first-principles analysis of the Expectation of Optimization Bias (EOB)<n>Our analysis reveals a fundamental paradigm paradox: the more deterministic and structured the time series, the more severe the bias by point-wise loss function.<n>We present a concrete solution that simultaneously achieves both principles via DFT or DWT.
arXiv Detail & Related papers (2025-12-21T06:08:22Z) - Entropy-Reservoir Bregman Projection: An Information-Geometric Unification of Model Collapse [3.533187668612022]
We present EntropyReser Bregman Projection- ERBP, an information-geometric framework that unifies these phenomena.<n>Our theory yields a necessary condition for collapse, (ii) a sufficient condition that guarantees a non-language entropy floor, and (iii) closed-form rates that depend on sample size.
arXiv Detail & Related papers (2025-12-16T19:50:03Z) - A Class of Accelerated Fixed-Point-Based Methods with Delayed Inexact Oracles and Its Applications [3.6997773420183866]
We develop a fixed-point-based framework using delayed inexact oracles to approximate a fixed point of a nonexpansive operator.<n>Our approach leverages both Nesterov's acceleration technique and the Krasnosel'skii-Mann (KM) iteration.
arXiv Detail & Related papers (2025-12-15T17:06:22Z) - Latency and Ordering Effects in Online Decisions [0.0]
Online decision systems operate under delayed feedback and order-sensitive dynamics.<n>We package heterogeneous latency, noncommutativity, and implementation-gap effects into a single lower-bound statement.
arXiv Detail & Related papers (2025-11-17T07:08:05Z) - INC: An Indirect Neural Corrector for Auto-Regressive Hybrid PDE Solvers [61.84396402100827]
We propose the Indirect Neural Corrector ($mathrmINC$), which integrates learned corrections into the governing equations.<n>$mathrmINC$ reduces the error amplification on the order of $t-1 + L$, where $t$ is the timestep and $L$ the Lipschitz constant.<n>We test $mathrmINC$ in extensive benchmarks, covering numerous differentiable solvers, neural backbones, and test cases ranging from a 1D chaotic system to 3D turbulence.
arXiv Detail & Related papers (2025-11-16T20:14:28Z) - Chebyshev Moment Regularization (CMR): Condition-Number Control with Moment Shaping [0.0]
We introduce textbfChebyshev Moment Regularization (CMR), a simple, architecture-agnostic loss that directly optimize layer spectra.<n>CMR jointly controls spectral edges via a log-condition proxy shapes and the interior via Chebyshev moments.<n>These results support textbfoptimization-driven spectral preconditioning: directly steering models toward well-conditioned regimes for stable, accurate learning.
arXiv Detail & Related papers (2025-10-17T06:54:41Z) - ERIS: An Energy-Guided Feature Disentanglement Framework for Out-of-Distribution Time Series Classification [51.07970070817353]
An ideal time series classification (TSC) should be able to capture invariant representations.<n>Current methods are largely unguided, lacking the semantic direction required to isolate truly universal features.<n>We propose an end-to-end Energy-Regularized Information for Shift-Robustness framework to enable guided and reliable feature disentanglement.
arXiv Detail & Related papers (2025-08-19T12:13:41Z) - Theoretical Framework for Tempered Fractional Gradient Descent: Application to Breast Cancer Classification [0.0]
This paper introduces Fractional Gradient Descent (TFGD), a novel optimization framework that synergizes fractional calculus with exponential tempering to enhance gradient-based learning.<n>TFGD addresses limitations by incorporating a tempered memory mechanism, where historical gradients are weighted by fractional coefficients $|w_j| = binomalphaj$ and exponentially decayed via a tempering parameter $lambda$.<n> Empirical validation on the Breast Cancer Wisconsin dataset demonstrates TFGD's superiority, achieving 98.25% test accuracy (vs. 92.11% for SGD) and 2$times$ faster convergence.
arXiv Detail & Related papers (2025-04-26T08:26:34Z) - Beyond likelihood ratio bias: Nested multi-time-scale stochastic approximation for likelihood-free parameter estimation [49.78792404811239]
We study inference in simulation-based models where the analytical form of the likelihood is unknown.<n>We use a ratio-free nested multi-time-scale approximation (SA) method that simultaneously tracks the score and drives the parameter update.<n>We show that our algorithm can eliminate the original bias $Obig(sqrtfrac1Nbig)$ and accelerate the convergence rate from $Obig(beta_k+sqrtfracalpha_kNbig)$.
arXiv Detail & Related papers (2024-11-20T02:46:15Z) - Large Continual Instruction Assistant [59.585544987096974]
Continual Instruction Tuning (CIT) is adopted to instruct Large Models to follow human intent data by data.<n>Existing update gradient would heavily destroy the performance on previous datasets during CIT process.<n>We propose a general continual instruction tuning framework to address the challenge.
arXiv Detail & Related papers (2024-10-08T11:24:59Z) - Towards Understanding the Generalizability of Delayed Stochastic Gradient Descent [63.43247232708004]
A gradient descent performed in an asynchronous manner plays a crucial role in training large-scale machine learning models.<n>Existing generalization error bounds are rather pessimistic and cannot reveal the correlation between asynchronous delays and generalization.<n>Our theoretical results indicate that asynchronous delays reduce the generalization error of the delayed SGD algorithm.
arXiv Detail & Related papers (2023-08-18T10:00:27Z) - Generalization and Stability of Interpolating Neural Networks with
Minimal Width [37.908159361149835]
We investigate the generalization and optimization of shallow neural-networks trained by gradient in the interpolating regime.
We prove the training loss number minimizations $m=Omega(log4 (n))$ neurons and neurons $Tapprox n$.
With $m=Omega(log4 (n))$ neurons and $Tapprox n$, we bound the test loss training by $tildeO (1/)$.
arXiv Detail & Related papers (2023-02-18T05:06:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.