Limits of Convergence-Rate Control for Open-Weight Safety
- URL: http://arxiv.org/abs/2602.18868v1
- Date: Sat, 21 Feb 2026 15:32:27 GMT
- Title: Limits of Convergence-Rate Control for Open-Weight Safety
- Authors: Domenic Rosati, Xijie Zeng, Hong Huang, Sebastian Dionicio, Subhabrata Majumdar, Frank Rudzicz, Hassan Sajjad,
- Abstract summary: We develop an algorithm that can both provably and empirically slow first- and second-order optimization in non-adversarial settings.<n>In adversarial settings, we establish a fundamental limit on a broad class of convergence rate control methods.
- Score: 23.243652317091456
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Open-weight foundation models can be fine-tuned for harmful purposes after release, yet no existing training resistance methods provide theoretical guarantees. Treating these interventions as convergence-rate control problems allows us to connect optimization speed to the spectral structure of model weights. We leverage this insight to develop a novel understanding of convergence rate control through spectral reparameterization and derive an algorithm, SpecDef, that can both provably and empirically slow first- and second-order optimization in non-adversarial settings. In adversarial settings, we establish a fundamental limit on a broad class of convergence rate control methods including our own: an attacker with sufficient knowledge can restore fast convergence at a linear increase in model size. In order to overcome this limitation, future works will need to investigate methods that are not equivalent to controlling convergence rate.
Related papers
- Near-Constant Strong Violation and Last-Iterate Convergence for Online CMDPs via Decaying Safety Margins [31.581870065866568]
We study safe online reinforcement learning in Constrained Markov Decision Processes (CMDPs) under strong regret and violation metrics.<n>Existing primal-dual methods that achieve sublinear strong reward regret incur growing strong constraint violation or are restricted to average-iterate convergence due to inherent oscillations.<n>We propose the Flexible safety Domain Optimization via Margin-regularized Exploration (FlexDOME) algorithm, the first to provably achieve near-constant $tildeO(1)$ strong constraint violation alongside sublinear strong regret and non-asymptotic last-iterate convergence.
arXiv Detail & Related papers (2026-02-11T14:54:26Z) - Comparing and correcting robustness metrics for quantum optimal control [1.6927349660459692]
We present a novel, systematic study demonstrating important numerical differences between adjoint end-point and toggling-frame approaches.<n>We also introduce a critical discretization correction to the widely-used robustness-frame estimator.<n>Our approach uniquely handles control and fidelity constraints while cleanly isolating robustness for dedicated optimization.
arXiv Detail & Related papers (2026-02-10T22:44:16Z) - Improved Convergence Rates of Muon Optimizer for Nonconvex Optimization [7.2620484413601325]
We establish sharper convergence guarantees for the Muon through a direct and simplified analysis.<n>Our results improve upon existing bounds by achieving faster convergence rates while covering a broader class of problem settings.
arXiv Detail & Related papers (2026-01-27T09:32:46Z) - Verifying Closed-Loop Contractivity of Learning-Based Controllers via Partitioning [52.23804865017831]
We address the problem of verifying closed-loop contraction in nonlinear control systems whose controller and contraction metric are both parameterized by neural networks.<n>We derive a tractable and scalable sufficient condition for closed-loop contractivity that reduces to checking that the dominant eigenvalue of a symmetric Metzler matrix is nonpositive.
arXiv Detail & Related papers (2025-12-01T23:06:56Z) - Rectified Robust Policy Optimization for Model-Uncertain Constrained Reinforcement Learning without Strong Duality [53.525547349715595]
We propose a novel primal-only algorithm called Rectified Robust Policy Optimization (RRPO)<n>RRPO operates directly on the primal problem without relying on dual formulations.<n>We show convergence to an approximately optimal feasible policy with complexity matching the best-known lower bound.
arXiv Detail & Related papers (2025-08-24T16:59:38Z) - One-Shot Safety Alignment for Large Language Models via Optimal Dualization [64.52223677468861]
This paper presents a perspective of dualization that reduces constrained alignment to an equivalent unconstrained alignment problem.
We do so by pre-optimizing a smooth and convex dual function that has a closed form.
Our strategy leads to two practical algorithms in model-based and preference-based settings.
arXiv Detail & Related papers (2024-05-29T22:12:52Z) - C-Learner: Constrained Learning for Causal Inference [4.370964009390564]
We propose a novel debiasing approach that achieves the best weighting of both worlds, producing stable plug-in estimates.<n>Our constrained learning framework solves for the best plug-in estimator under the constraint that the first-order error with respect to the plugged-in quantity is zero.
arXiv Detail & Related papers (2024-05-15T16:38:28Z) - Exploiting Diffusion Prior for Real-World Image Super-Resolution [75.5898357277047]
We present a novel approach to leverage prior knowledge encapsulated in pre-trained text-to-image diffusion models for blind super-resolution.
By employing our time-aware encoder, we can achieve promising restoration results without altering the pre-trained synthesis model.
arXiv Detail & Related papers (2023-05-11T17:55:25Z) - Acceleration Methods [57.202881673406324]
We first use quadratic optimization problems to introduce two key families of acceleration methods.
We discuss momentum methods in detail, starting with the seminal work of Nesterov.
We conclude by discussing restart schemes, a set of simple techniques for reaching nearly optimal convergence rates.
arXiv Detail & Related papers (2021-01-23T17:58:25Z) - On The Verification of Neural ODEs with Stochastic Guarantees [14.490826225393096]
We show that Neural ODEs, an emerging class of timecontinuous neural networks, can be verified by solving a set of global-optimization problems.
We introduce Lagran Reachability ( SLR), an abstraction-based technique for constructing a tight Reachtube.
arXiv Detail & Related papers (2020-12-16T11:04:34Z) - On Lower Bounds for Standard and Robust Gaussian Process Bandit
Optimization [55.937424268654645]
We consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm.
We provide a novel proof technique for deriving lower bounds on the regret, with benefits including simplicity, versatility, and an improved dependence on the error probability.
arXiv Detail & Related papers (2020-08-20T03:48:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.