Gradient Descent as Loss Landscape Navigation: a Normative Framework for Deriving Learning Rules
- URL: http://arxiv.org/abs/2510.26997v1
- Date: Thu, 30 Oct 2025 20:56:35 GMT
- Title: Gradient Descent as Loss Landscape Navigation: a Normative Framework for Deriving Learning Rules
- Authors: John J. Vastola, Samuel J. Gershman, Kanaka Rajan,
- Abstract summary: We propose a theoretical framework that casts learning rules as policies for navigating (partially observable) loss landscapes.<n>A range of well-known rules emerge naturally within this framework under different assumptions.<n>We show that resetting learning strategies like weight can be understood as optimal responses to task uncertainty.
- Score: 8.844699137494105
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning rules -- prescriptions for updating model parameters to improve performance -- are typically assumed rather than derived. Why do some learning rules work better than others, and under what assumptions can a given rule be considered optimal? We propose a theoretical framework that casts learning rules as policies for navigating (partially observable) loss landscapes, and identifies optimal rules as solutions to an associated optimal control problem. A range of well-known rules emerge naturally within this framework under different assumptions: gradient descent from short-horizon optimization, momentum from longer-horizon planning, natural gradients from accounting for parameter space geometry, non-gradient rules from partial controllability, and adaptive optimizers like Adam from online Bayesian inference of loss landscape shape. We further show that continual learning strategies like weight resetting can be understood as optimal responses to task uncertainty. By unifying these phenomena under a single objective, our framework clarifies the computational structure of learning and offers a principled foundation for designing adaptive algorithms.
Related papers
- ODELoRA: Training Low-Rank Adaptation by Solving Ordinary Differential Equations [54.886931928255564]
Low-rank adaptation (LoRA) has emerged as a widely adopted parameter-efficient fine-tuning method in deep transfer learning.<n>We propose a novel continuous-time optimization dynamic for LoRA factor matrices in the form of an ordinary differential equation (ODE)<n>We show that ODELoRA achieves stable feature learning, a property that is crucial for training deep neural networks at different scales of problem dimensionality.
arXiv Detail & Related papers (2026-02-07T10:19:36Z) - OBLR-PO: A Theoretical Framework for Stable Reinforcement Learning [12.77713716713937]
We provide a unified theoretical framework that characterizes the statistical properties of commonly used policy-gradient estimators.<n>We derive an adaptive learning-rate schedule governed by the signal-to-noise ratio (SNR) of gradients.<n>We further show that the variance-optimal baseline is a gradient-weighted estimator, offering a new principle for variance reduction.
arXiv Detail & Related papers (2025-11-28T16:09:28Z) - Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning [77.92320830700797]
Reinforcement Learning has played a central role in enabling reasoning capabilities of Large Language Models.<n>We propose a tractable computational framework that tracks and leverages curvature information during policy updates.<n>The algorithm, Curvature-Aware Policy Optimization (CAPO), identifies samples that contribute to unstable updates and masks them out.
arXiv Detail & Related papers (2025-10-01T12:29:32Z) - Transfer Learning for Classification under Decision Rule Drift with Application to Optimal Individualized Treatment Rule Estimation [50.34670342434884]
We propose a novel methodology for modeling posterior drift through Bayes decision rules.<n>Under mild regularity conditions, we establish the consistency of our estimators and derive the risk bounds.<n>We illustrate the broad applicability of our method by adapting it to the estimation of optimal individualized treatment rules.
arXiv Detail & Related papers (2025-08-28T16:03:06Z) - Training Deep Learning Models with Norm-Constrained LMOs [56.00317694850397]
We propose a new family of algorithms that uses the linear minimization oracle (LMO) to adapt to the geometry of the problem.<n>We demonstrate significant speedups on nanoGPT training using our algorithm, Scion, without any reliance on Adam.
arXiv Detail & Related papers (2025-02-11T13:10:34Z) - Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds [59.875550175217874]
We show that a simple Model-based Reinforcement Learning scheme achieves strong regret and sample bounds in online and offline RL settings.
We highlight that our algorithms are simple, fairly standard, and indeed have been extensively studied in the RL literature.
arXiv Detail & Related papers (2024-08-16T19:52:53Z) - Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric.
We propose a single framework built on their equivalence in learning scenarios.
Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z) - Discounted Adaptive Online Learning: Towards Better Regularization [5.5899168074961265]
We study online learning in adversarial nonstationary environments.
We propose an adaptive (i.e., instance optimal) algorithm that improves the widespread non-adaptive baseline.
We also consider the (Gibbs and Candes, 2021)-style online conformal prediction problem.
arXiv Detail & Related papers (2024-02-05T04:29:39Z) - Reinforcement Logic Rule Learning for Temporal Point Processes [17.535382791003176]
We propose a framework that can incrementally expand the explanatory temporal logic rule set to explain the occurrence of temporal events.
The proposed algorithm alternates between a master problem, where the current rule set weights are updated, and a subproblem, where a new rule is searched and included to best increase the likelihood.
We evaluate our methods on both synthetic and real healthcare datasets, obtaining promising results.
arXiv Detail & Related papers (2023-08-11T12:05:32Z) - Introduction to Online Control [31.67032731719622]
In online nonstochastic control, both the cost functions as well as the perturbations from the assumed dynamical model are chosen by an adversary.<n>The target is to attain low regret against the best policy in hindsight from a benchmark class of policies.
arXiv Detail & Related papers (2022-11-17T16:12:45Z) - Analyzing the discrepancy principle for kernelized spectral filter
learning algorithms [2.132096006921048]
We study the discrepancy principle, as well as modifications based on smoothed residuals, for kernelized spectral filter learning algorithms.
Our main theoretical bounds are oracle inequalities established for the empirical estimation error (fixed design), and for the prediction error (random design)
arXiv Detail & Related papers (2020-04-17T20:08:44Z) - Adaptive Stopping Rule for Kernel-based Gradient Descent Algorithms [27.002742106701863]
We propose an adaptive stopping rule for kernel-based gradient descent algorithms.
We analyze the performance of the adaptive stopping rule in the framework of learning theory.
arXiv Detail & Related papers (2020-01-09T08:12:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.