Related papers: Avoiding Premature Collapse: Adaptive Annealing for Entropy-Regularized Structural Inference

Related papers

Physics-informed post-processing of stabilized finite element solutions for transient convection-dominated problems [0.0]
This work presents a hybrid computational framework that extends the PINN-Augmented SUPG with Shock-Capturing (PASSC) methodology.<n>The approach combines a semi-discrete finite stabilized method with a PINN element correction strategy for transient convection-diffusion-based equations.
arXiv Detail & Related papers (2026-03-03T18:51:17Z)
Entropy-Controlled Flow Matching [0.08460698440162889]
We propose a constrained variational principle over continuity-equation paths enforcing a global entropy-rate budget d/dt H(mu_t) >= -lambda.<n>We obtain certificate-style mode-coverage and density-floor guarantees with Lipschitz, and construct near-optimal counterexamples for unconstrained flow matching.
arXiv Detail & Related papers (2026-02-25T06:07:01Z)
Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs [55.77845440440496]
Push-based decentralized communication enables optimization over communication networks, where information exchange may be asymmetric.<n>We develop a unified uniform-stability framework for the Gradient Push (SGP) algorithm.<n>A key technical ingredient is an imbalance-aware generalization bound through two quantities.
arXiv Detail & Related papers (2026-02-24T05:32:03Z)
KoopGen: Koopman Generator Networks for Representing and Predicting Dynamical Systems with Continuous Spectra [65.11254608352982]
We introduce a generator-based neural Koopman framework that models dynamics through a structured, state-dependent representation of Koopman generators.<n>By exploiting the intrinsic Cartesian decomposition into skew-adjoint and self-adjoint components, KoopGen separates conservative transport from irreversible dissipation.
arXiv Detail & Related papers (2026-02-15T06:32:23Z)
Generalizing GNNs with Tokenized Mixture of Experts [75.8310720413187]
We show that improving stability requires reducing reliance on shift-sensitive features, leaving an irreducible worst-case generalization floor.<n>We propose STEM-GNN, a pretrain-then-finetune framework with a mixture-of-experts encoder for diverse computation paths.<n>Across nine node, link, and graph benchmarks, STEM-GNN achieves a stronger three-way balance, improving robustness to degree/homophily shifts and to feature/edge corruptions while remaining competitive on clean graphs.
arXiv Detail & Related papers (2026-02-09T22:48:30Z)
Memory-Conditioned Flow-Matching for Stable Autoregressive PDE Rollouts [0.0]
Autoregressive generative PDE solvers can be accurate one step ahead yet drift over long rollouts.<n>We show that eliminating unresolved variables yields an exact resolved evolution with a Markov term.<n>We then derive discrete Grnwall rollout bounds that separate memory approximation from conditional generation error.
arXiv Detail & Related papers (2026-02-06T13:21:52Z)
Dissipative Learning: A Framework for Viable Adaptive Systems [0.6345523830122167]
We introduce the BEDS (Bayesian Emergent Dissipative Structures) framework, modeling learning as the evolution of compressed belief states under dissipation constraints.<n>A central contribution is the Optimality Theorem, showing that Fisher-Rao regularization measuring change via information divergence rather than Euclidean distance is the unique thermodynamically optimal regularization strategy.
arXiv Detail & Related papers (2026-01-25T18:10:15Z)
The Procrustean Bed of Time Series: The Optimization Bias of Point-wise Loss [53.542743390809356]
This paper aims to provide a first-principles analysis of the Expectation of Optimization Bias (EOB)<n>Our analysis reveals a fundamental paradigm paradox: the more deterministic and structured the time series, the more severe the bias by point-wise loss function.<n>We present a concrete solution that simultaneously achieves both principles via DFT or DWT.
arXiv Detail & Related papers (2025-12-21T06:08:22Z)
Entropy-Reservoir Bregman Projection: An Information-Geometric Unification of Model Collapse [3.533187668612022]
We present EntropyReser Bregman Projection- ERBP, an information-geometric framework that unifies these phenomena.<n>Our theory yields a necessary condition for collapse, (ii) a sufficient condition that guarantees a non-language entropy floor, and (iii) closed-form rates that depend on sample size.
arXiv Detail & Related papers (2025-12-16T19:50:03Z)
A Class of Accelerated Fixed-Point-Based Methods with Delayed Inexact Oracles and Its Applications [3.6997773420183866]
We develop a fixed-point-based framework using delayed inexact oracles to approximate a fixed point of a nonexpansive operator.<n>Our approach leverages both Nesterov's acceleration technique and the Krasnosel'skii-Mann (KM) iteration.
arXiv Detail & Related papers (2025-12-15T17:06:22Z)
Latency and Ordering Effects in Online Decisions [0.0]
Online decision systems operate under delayed feedback and order-sensitive dynamics.<n>We package heterogeneous latency, noncommutativity, and implementation-gap effects into a single lower-bound statement.
arXiv Detail & Related papers (2025-11-17T07:08:05Z)
INC: An Indirect Neural Corrector for Auto-Regressive Hybrid PDE Solvers [61.84396402100827]
We propose the Indirect Neural Corrector ($mathrmINC$), which integrates learned corrections into the governing equations.<n>$mathrmINC$ reduces the error amplification on the order of $t-1 + L$, where $t$ is the timestep and $L$ the Lipschitz constant.<n>We test $mathrmINC$ in extensive benchmarks, covering numerous differentiable solvers, neural backbones, and test cases ranging from a 1D chaotic system to 3D turbulence.
arXiv Detail & Related papers (2025-11-16T20:14:28Z)
Chebyshev Moment Regularization (CMR): Condition-Number Control with Moment Shaping [0.0]
We introduce textbfChebyshev Moment Regularization (CMR), a simple, architecture-agnostic loss that directly optimize layer spectra.<n>CMR jointly controls spectral edges via a log-condition proxy shapes and the interior via Chebyshev moments.<n>These results support textbfoptimization-driven spectral preconditioning: directly steering models toward well-conditioned regimes for stable, accurate learning.
arXiv Detail & Related papers (2025-10-17T06:54:41Z)
ERIS: An Energy-Guided Feature Disentanglement Framework for Out-of-Distribution Time Series Classification [51.07970070817353]
An ideal time series classification (TSC) should be able to capture invariant representations.<n>Current methods are largely unguided, lacking the semantic direction required to isolate truly universal features.<n>We propose an end-to-end Energy-Regularized Information for Shift-Robustness framework to enable guided and reliable feature disentanglement.
arXiv Detail & Related papers (2025-08-19T12:13:41Z)
Theoretical Framework for Tempered Fractional Gradient Descent: Application to Breast Cancer Classification [0.0]
This paper introduces Fractional Gradient Descent (TFGD), a novel optimization framework that synergizes fractional calculus with exponential tempering to enhance gradient-based learning.<n>TFGD addresses limitations by incorporating a tempered memory mechanism, where historical gradients are weighted by fractional coefficients $|w_j| = binomalphaj$ and exponentially decayed via a tempering parameter $lambda$.<n> Empirical validation on the Breast Cancer Wisconsin dataset demonstrates TFGD's superiority, achieving 98.25% test accuracy (vs. 92.11% for SGD) and 2$times$ faster convergence.
arXiv Detail & Related papers (2025-04-26T08:26:34Z)
Beyond likelihood ratio bias: Nested multi-time-scale stochastic approximation for likelihood-free parameter estimation [49.78792404811239]
We study inference in simulation-based models where the analytical form of the likelihood is unknown.<n>We use a ratio-free nested multi-time-scale approximation (SA) method that simultaneously tracks the score and drives the parameter update.<n>We show that our algorithm can eliminate the original bias $Obig(sqrtfrac1Nbig)$ and accelerate the convergence rate from $Obig(beta_k+sqrtfracalpha_kNbig)$.
arXiv Detail & Related papers (2024-11-20T02:46:15Z)
Large Continual Instruction Assistant [59.585544987096974]
Continual Instruction Tuning (CIT) is adopted to instruct Large Models to follow human intent data by data.<n>Existing update gradient would heavily destroy the performance on previous datasets during CIT process.<n>We propose a general continual instruction tuning framework to address the challenge.
arXiv Detail & Related papers (2024-10-08T11:24:59Z)
Towards Understanding the Generalizability of Delayed Stochastic Gradient Descent [63.43247232708004]
A gradient descent performed in an asynchronous manner plays a crucial role in training large-scale machine learning models.<n>Existing generalization error bounds are rather pessimistic and cannot reveal the correlation between asynchronous delays and generalization.<n>Our theoretical results indicate that asynchronous delays reduce the generalization error of the delayed SGD algorithm.
arXiv Detail & Related papers (2023-08-18T10:00:27Z)
Generalization and Stability of Interpolating Neural Networks with Minimal Width [37.908159361149835]
We investigate the generalization and optimization of shallow neural-networks trained by gradient in the interpolating regime. We prove the training loss number minimizations $m=Omega(log4 (n))$ neurons and neurons $Tapprox n$. With $m=Omega(log4 (n))$ neurons and $Tapprox n$, we bound the test loss training by $tildeO (1/)$.
arXiv Detail & Related papers (2023-02-18T05:06:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.