Related papers: Introduction to optimization methods for training SciML models

Introduction to optimization methods for training SciML models

URL: http://arxiv.org/abs/2601.10222v1
Date: Thu, 15 Jan 2026 09:36:15 GMT
Title: Introduction to optimization methods for training SciML models
Authors: Alena Kopaničáková, Elisa Riccietti,
Abstract summary: Optimization is central to both modern machine learning (ML) and scientific machine learning (SciML)<n>This document provides a unified introduction to optimization methods in ML and SciML, emphasizing how problem structure shapes algorithmic choices.
Score: 4.970277730082773
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Optimization is central to both modern machine learning (ML) and scientific machine learning (SciML), yet the structure of the underlying optimization problems differs substantially across these domains. Classical ML typically relies on stochastic, sample-separable objectives that favor first-order and adaptive gradient methods. In contrast, SciML often involves physics-informed or operator-constrained formulations in which differential operators induce global coupling, stiffness, and strong anisotropy in the loss landscape. As a result, optimization behavior in SciML is governed by the spectral properties of the underlying physical models rather than by data statistics, frequently limiting the effectiveness of standard stochastic methods and motivating deterministic or curvature-aware approaches. This document provides a unified introduction to optimization methods in ML and SciML, emphasizing how problem structure shapes algorithmic choices. We review first- and second-order optimization techniques in both deterministic and stochastic settings, discuss their adaptation to physics-constrained and data-driven SciML models, and illustrate practical strategies through tutorial examples, while highlighting open research directions at the interface of scientific computing and scientific machine learning.

Related papers

ODELoRA: Training Low-Rank Adaptation by Solving Ordinary Differential Equations [54.886931928255564]
Low-rank adaptation (LoRA) has emerged as a widely adopted parameter-efficient fine-tuning method in deep transfer learning.<n>We propose a novel continuous-time optimization dynamic for LoRA factor matrices in the form of an ordinary differential equation (ODE)<n>We show that ODELoRA achieves stable feature learning, a property that is crucial for training deep neural networks at different scales of problem dimensionality.
arXiv Detail & Related papers (2026-02-07T10:19:36Z)
Deep Unfolding: Recent Developments, Theory, and Design Guidelines [99.63555420898554]
This article provides a tutorial-style overview of deep unfolding, a framework that transforms optimization algorithms into structured, trainable ML architectures.<n>We review the foundations of optimization for inference and for learning, introduce four representative design paradigms for deep unfolding, and discuss the distinctive training schemes that arise from their iterative nature.
arXiv Detail & Related papers (2025-12-03T13:16:35Z)
Beyond First-Order: Training LLMs with Stochastic Conjugate Subgradients and AdamW [2.028622227373579]
gradient-based descent (SGD) have long been central to training large language models (LLMs)<n>This paper proposes a conjugate subgradient method together with adaptive sampling specifically for training LLMs.
arXiv Detail & Related papers (2025-07-01T23:30:15Z)
Reparameterized LLM Training via Orthogonal Equivalence Transformation [54.80172809738605]
We present POET, a novel training algorithm that uses Orthogonal Equivalence Transformation to optimize neurons.<n>POET can stably optimize the objective function with improved generalization.<n>We develop efficient approximations that make POET flexible and scalable for training large-scale neural networks.
arXiv Detail & Related papers (2025-06-09T17:59:34Z)
Reconstructing Physics-Informed Machine Learning for Traffic Flow Modeling: a Multi-Gradient Descent and Pareto Learning Approach [5.055539099879598]
Physics-informed machine learning (PIML) is crucial in modern flow modeling.<n>This paper introduces a paradigm shift in PIML by reformulating the training process as a multi-objective optimization problem.
arXiv Detail & Related papers (2025-05-19T15:23:24Z)
Physics-Informed Inference Time Scaling via Simulation-Calibrated Scientific Machine Learning [5.728698570173857]
High-dimensional partial differential equations (PDEs) pose significant computational challenges across fields ranging from quantum chemistry to economics and finance.<n>Although scientific machine learning (SciML) techniques offer approximate solutions, they often suffer from bias and neglect crucial physical insights.<n>We propose Simulation-Calibrated Scientific Machine Learning (SCa), a framework that dynamically refines and debiases the SCiML predictions during inference by enforcing the physical laws.
arXiv Detail & Related papers (2025-04-22T18:01:45Z)
A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning. These problems are often formalized as Bi-Level optimizations (BLO) We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z)
Multi-level Monte-Carlo Gradient Methods for Stochastic Optimization with Biased Oracles [23.648702140754967]
We consider optimization when one only has to access biased oracles and obtain objective with low biases. We show that biased gradient methods can reduce variance in the non-varied regime. We also show that conditional optimization methods significantly improve best-known complexities in the literature for conditional optimization and risk optimization.
arXiv Detail & Related papers (2024-08-20T17:56:16Z)
Conservative Objective Models for Effective Offline Model-Based Optimization [78.19085445065845]
Computational design problems arise in a number of settings, from synthetic biology to computer architectures. We propose a method that learns a model of the objective function that lower bounds the actual value of the ground-truth objective on out-of-distribution inputs. COMs are simple to implement and outperform a number of existing methods on a wide range of MBO problems.
arXiv Detail & Related papers (2021-07-14T17:55:28Z)
Meta-Learning with Neural Tangent Kernels [58.06951624702086]
We propose the first meta-learning paradigm in the Reproducing Kernel Hilbert Space (RKHS) induced by the meta-model's Neural Tangent Kernel (NTK) Within this paradigm, we introduce two meta-learning algorithms, which no longer need a sub-optimal iterative inner-loop adaptation as in the MAML framework. We achieve this goal by 1) replacing the adaptation with a fast-adaptive regularizer in the RKHS; and 2) solving the adaptation analytically based on the NTK theory.
arXiv Detail & Related papers (2021-02-07T20:53:23Z)
Macroscopic Traffic Flow Modeling with Physics Regularized Gaussian Process: A New Insight into Machine Learning Applications [14.164058812512371]
This study presents a new modeling framework, named physics regularized machine learning (PRML), to encode classical traffic flow models into the machine learning architecture. To prove the effectiveness of the proposed model, this paper conducts empirical studies on a real-world dataset which is collected from a stretch of I-15 freeway, Utah.
arXiv Detail & Related papers (2020-02-06T17:22:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.