Related papers: FANoS: Friction-Adaptive Nosé--Hoover Symplectic Momentum for Stiff Objectives

FANoS: Friction-Adaptive Nosé--Hoover Symplectic Momentum for Stiff Objectives

URL: http://arxiv.org/abs/2601.00889v1
Date: Wed, 31 Dec 2025 11:49:49 GMT
Title: FANoS: Friction-Adaptive Nosé--Hoover Symplectic Momentum for Stiff Objectives
Authors: Nalin Dhiman,
Abstract summary: We study a physics-inspired, scalar.<n>FANoS, which combines a momentum update written as a.<n>discretized second-order dynamical system.<n>We provide the algorithm limited theoretical observations in idealized settings.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study a physics-inspired optimizer, \emph{FANoS} (Friction-Adaptive Nosé--Hoover Symplectic momentum), which combines (i) a momentum update written as a discretized second-order dynamical system, (ii) a Nosé--Hoover-like thermostat variable that adapts a scalar friction coefficient using kinetic-energy feedback, and (iii) a semi-implicit (symplectic-Euler) integrator, optionally with a diagonal RMS preconditioner. The method is motivated by structure-preserving integration and thermostat ideas from molecular dynamics, but is used here purely as an optimization heuristic. We provide the algorithm and limited theoretical observations in idealized settings. On the deterministic Rosenbrock-100D benchmark with 3000 gradient evaluations, FANoS-RMS attains a mean final objective value of $1.74\times 10^{-2}$, improving substantially over unclipped AdamW ($48.50$) and SGD+momentum ($90.76$) in this protocol. However, AdamW with gradient clipping is stronger, reaching $1.87\times 10^{-3}$, and L-BFGS reaches $\approx 4.4\times 10^{-10}$. On ill-conditioned convex quadratics and in a small PINN warm-start suite (Burgers and Allen--Cahn), the default FANoS configuration underperforms AdamW and can be unstable or high-variance. Overall, the evidence supports a conservative conclusion: FANoS is an interpretable synthesis of existing ideas that can help on some stiff nonconvex valleys, but it is not a generally superior replacement for modern baselines, and its behavior is sensitive to temperature-schedule and hyperparameter choices.

Related papers

Balancing Centralized Learning and Distributed Self-Organization: A Hybrid Model for Embodied Morphogenesis [0.0]
We investigate how to couple a learnable brain-like'' controller to a cell-like'' Gray--Scott substrate to steer pattern formation with minimal effort.<n>A compact convolutional policy is embedded in a differentiable PyTorch reaction--diffusion simulator.
arXiv Detail & Related papers (2025-11-13T09:05:27Z)
Evidence that the Quantum Approximate Optimization Algorithm Optimizes the Sherrington-Kirkpatrick Model Efficiently in the Average Case [3.4872784636892047]
Sherrington-Kirkpatrick (SK) model serves as a foundational framework for understanding disordered systems.<n>Quantum Approximate Optimization Algorithm (QAOA) is a quantum optimization algorithm whose performance monotonically improves with its depth $p$.<n>We analyze QAOA applied to the SK model in the infinite-size limit and provide numerical evidence that it obtains a $(1-epsilon)$ approximation to the optimal energy with circuit depth $mathcalO(n/epsilon1.13)$ in the average case.
arXiv Detail & Related papers (2025-05-12T18:00:01Z)
Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture.<n>Non-smooth regularization is often incorporated into machine learning tasks.<n>We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z)
A stochastic first-order method with multi-extrapolated momentum for highly smooth unconstrained optimization [3.8919212824749296]
We show that the proposed SFOM can accelerate optimization by exploiting the high-order smoothness of the objective function $f$.<n>To the best of our knowledge, this is the first SFOM to leverage arbitrary-order smoothness of the objective function for acceleration.
arXiv Detail & Related papers (2024-12-19T03:22:47Z)
Boltzmann-Aligned Inverse Folding Model as a Predictor of Mutational Effects on Protein-Protein Interactions [48.58317905849438]
Predicting the change in binding free energy ($Delta Delta G$) is crucial for understanding and modulating protein-protein interactions. We propose the Boltzmann Alignment technique to transfer knowledge from pre-trained inverse folding models to $Delta Delta G$ prediction.
arXiv Detail & Related papers (2024-10-12T14:13:42Z)
Inertial Confinement Fusion Forecasting via Large Language Models [48.76222320245404]
In this study, we introduce $textbfLPI-LLM$, a novel integration of Large Language Models (LLMs) with classical reservoir computing paradigms. We propose the $textitLLM-anchored Reservoir$, augmented with a $textitFusion-specific Prompt$, enabling accurate forecasting of $textttLPI$-generated-hot electron dynamics during implosion. We also present $textbfLPI4AI$, the first $textttLPI$ benchmark based
arXiv Detail & Related papers (2024-07-15T05:46:44Z)
STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization [74.1615979057429]
We investigate non-batch optimization problems where the objective is an expectation over smooth loss functions. Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.
arXiv Detail & Related papers (2021-11-01T15:43:36Z)
Robust Online Control with Model Misspecification [96.23493624553998]
We study online control of an unknown nonlinear dynamical system with model misspecification. Our study focuses on robustness, which measures how much deviation from the assumed linear approximation can be tolerated.
arXiv Detail & Related papers (2021-07-16T07:04:35Z)
A New Framework for Variance-Reduced Hamiltonian Monte Carlo [88.84622104944503]
We propose a new framework of variance-reduced Hamiltonian Monte Carlo (HMC) methods for sampling from an $L$-smooth and $m$-strongly log-concave distribution. We show that the unbiased gradient estimators, including SAGA and SVRG, based HMC methods achieve highest gradient efficiency with small batch size. Experimental results on both synthetic and real-world benchmark data show that our new framework significantly outperforms the full gradient and gradient HMC approaches.
arXiv Detail & Related papers (2021-02-09T02:44:24Z)
Adaptive Gradient Methods Can Be Provably Faster than SGD after Finite Epochs [25.158203665218164]
We show that adaptive gradient methods can be faster than random shuffling SGD after finite time. To the best of our knowledge, it is the first to demonstrate that adaptive gradient methods can be faster than SGD after finite time.
arXiv Detail & Related papers (2020-06-12T09:39:47Z)
Controlling Rayleigh-B\'enard convection via Reinforcement Learning [62.997667081978825]
The identification of effective control strategies to suppress or enhance the convective heat exchange under fixed external thermal gradients is an outstanding fundamental and technological issue. In this work, we explore a novel approach, based on a state-of-the-art Reinforcement Learning (RL) algorithm. We show that our RL-based control is able to stabilize the conductive regime and bring the onset of convection up to a Rayleigh number.
arXiv Detail & Related papers (2020-03-31T16:39:25Z)
Momentum Improves Normalized SGD [51.27183254738711]
We show that adding momentum provably removes the need for large batch sizes on objectives. We show that our method is effective when employed on popular large scale tasks such as ResNet-50 and BERT pretraining.
arXiv Detail & Related papers (2020-02-09T07:00:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.