Related papers: Why Smooth Stability Assumptions Fail for ReLU Learning

Related papers

On the Stability of Nonlinear Dynamics in GD and SGD: Beyond Quadratic Potentials [33.36228161076605]
The stability of the iterates during training plays a key role in determining the minima obtained by optimization algorithms.<n>Recent work has shown that GD may stably oscillate near a linearly unstable minimum and still converge once the step size decays.<n>We show that nonlinear dynamics can diverge in expectation even if a single batch is unstable.
arXiv Detail & Related papers (2026-02-16T14:36:55Z)
A Function-Space Stability Boundary for Generalization in Interpolating Learning Systems [0.0]
We model training as a function-space trajectory and measure sensitivity to single-sample perturbations along this trajectory.<n>A small certificate implies stability-based generalization, while we also prove that there exist interpolating regimes with small risk.
arXiv Detail & Related papers (2026-02-03T13:31:12Z)
Not All Preferences Are Created Equal: Stability-Aware and Gradient-Efficient Alignment for Reasoning Models [52.48582333951919]
We propose a dynamic framework designed to enhance alignment reliability by maximizing the Signal-to-Noise Ratio of policy updates.<n>SAGE (Stability-Aware Gradient Efficiency) integrates a coarse-grained curriculum mechanism that refreshes candidate pools based on model competence.<n> Experiments on multiple mathematical reasoning benchmarks demonstrate that SAGE significantly accelerates convergence and outperforms static baselines.
arXiv Detail & Related papers (2026-02-01T12:56:10Z)
Stability as a Liability:Systematic Breakdown of Linguistic Structure in LLMs [5.96875296117642]
We show that stable parameter trajectories lead stationary solutions to minimize the forward KL divergence to the empirical distribution.<n>We empirically validate this effect using a controlled feedback-based training framework.<n>It indicates that optimization stability and generative expressivity are not inherently aligned, and that stability alone is an insufficient indicator of generative quality.
arXiv Detail & Related papers (2026-01-26T15:34:50Z)
LILAD: Learning In-context Lyapunov-stable Adaptive Dynamics Models [4.66260462241022]
LILAD is a novel framework for system identification that jointly guarantees stability and adaptability.<n>We evaluate LILAD on benchmark autonomous systems and demonstrate that it outperforms adaptive, robust, and non-adaptive baselines in predictive accuracy.
arXiv Detail & Related papers (2025-11-26T19:20:49Z)
React to Surprises: Stable-by-Design Neural Feedback Control and the Youla-REN [43.988843102040725]
We study parameterizations of stabilizing nonlinear policies for learning-based control.<n>We propose a structure based on a nonlinear version of the Youla-Kucera parameterization combined with robust neural networks.
arXiv Detail & Related papers (2025-06-02T00:36:24Z)
On the Convergence of Gradient Descent for Large Learning Rates [55.33626480243135]
We show that convergence is impossible when a fixed step size is used.<n>We provide a proof of this in the case of linear neural networks with a squared loss.<n>We also prove the impossibility of convergence for more general losses without requiring strong assumptions such as Lipschitz continuity for the gradient.
arXiv Detail & Related papers (2024-02-20T16:01:42Z)
Type-II Saddles and Probabilistic Stability of Stochastic Gradient Descent [16.849376037005452]
We show that saddle points in neural networks can be divided into two types, among which the Type-II saddles are especially difficult to escape from because the gradient noise vanishes at the saddle. The dynamics of SGD around these saddles are thus to leading order described by a random matrix product process. We leverage to show that saddle points can be either attractive or repulsive for SGD, and its dynamics can be classified into four different phases, depending on the signal-to-noise ratio in the gradient close to the saddle.
arXiv Detail & Related papers (2023-03-23T08:17:10Z)
Stability and Generalization for Markov Chain Stochastic Gradient Methods [49.981789906200035]
We provide a comprehensive generalization analysis of MC-SGMs for both minimization and minimax problems. We establish the optimal excess population risk bounds for both smooth and non-smooth cases. We develop the first nearly optimal convergence rates for convex-concave problems both in expectation and with high probability.
arXiv Detail & Related papers (2022-09-16T15:42:51Z)
Beyond the Edge of Stability via Two-step Gradient Updates [49.03389279816152]
Gradient Descent (GD) is a powerful workhorse of modern machine learning. GD's ability to find local minimisers is only guaranteed for losses with Lipschitz gradients. This work focuses on simple, yet representative, learning problems via analysis of two-step gradient updates.
arXiv Detail & Related papers (2022-06-08T21:32:50Z)
KCRL: Krasovskii-Constrained Reinforcement Learning with Guaranteed Stability in Nonlinear Dynamical Systems [66.9461097311667]
We propose a model-based reinforcement learning framework with formal stability guarantees. The proposed method learns the system dynamics up to a confidence interval using feature representation. We show that KCRL is guaranteed to learn a stabilizing policy in a finite number of interactions with the underlying unknown system.
arXiv Detail & Related papers (2022-06-03T17:27:04Z)
Training Generative Adversarial Networks by Solving Ordinary Differential Equations [54.23691425062034]
We study the continuous-time dynamics induced by GAN training. From this perspective, we hypothesise that instabilities in training GANs arise from the integration error. We experimentally verify that well-known ODE solvers (such as Runge-Kutta) can stabilise training.
arXiv Detail & Related papers (2020-10-28T15:23:49Z)
Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent [55.85456985750134]
We introduce a new stability measure called on-average model stability, for which we develop novel bounds controlled by the risks of SGD iterates. This yields generalization bounds depending on the behavior of the best model, and leads to the first-ever-known fast bounds in the low-noise setting. To our best knowledge, this gives the firstever-known stability and generalization for SGD with even non-differentiable loss functions.
arXiv Detail & Related papers (2020-06-15T06:30:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.