Bilevel optimization for learning hyperparameters: Application to solving PDEs and inverse problems with Gaussian processes
- URL: http://arxiv.org/abs/2510.05568v1
- Date: Tue, 07 Oct 2025 04:22:09 GMT
- Title: Bilevel optimization for learning hyperparameters: Application to solving PDEs and inverse problems with Gaussian processes
- Authors: Nicholas H. Nelsen, Houman Owhadi, Andrew M. Stuart, Xianjin Yang, Zongren Zou,
- Abstract summary: kernel- and neural network-based approaches for partial differential equations (PDEs), inverse problems, and supervised learning tasks, depend crucially on the choice of hyper parameters.<n>We propose an efficient strategy for hyperparameter optimization within the bilevel framework by employing a Gauss-Newton linearization of the inner optimization step.<n>Our approach provides closed-form updates, eliminating the need for repeated costly PDE solves.
- Score: 4.197402763771375
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Methods for solving scientific computing and inference problems, such as kernel- and neural network-based approaches for partial differential equations (PDEs), inverse problems, and supervised learning tasks, depend crucially on the choice of hyperparameters. Specifically, the efficacy of such methods, and in particular their accuracy, stability, and generalization properties, strongly depends on the choice of hyperparameters. While bilevel optimization offers a principled framework for hyperparameter tuning, its nested optimization structure can be computationally demanding, especially in PDE-constrained contexts. In this paper, we propose an efficient strategy for hyperparameter optimization within the bilevel framework by employing a Gauss-Newton linearization of the inner optimization step. Our approach provides closed-form updates, eliminating the need for repeated costly PDE solves. As a result, each iteration of the outer loop reduces to a single linearized PDE solve, followed by explicit gradient-based hyperparameter updates. We demonstrate the effectiveness of the proposed method through Gaussian process models applied to nonlinear PDEs and to PDE inverse problems. Extensive numerical experiments highlight substantial improvements in accuracy and robustness compared to conventional random hyperparameter initialization. In particular, experiments with additive kernels and neural network-parameterized deep kernels demonstrate the method's scalability and effectiveness for high-dimensional hyperparameter optimization.
Related papers
- Learning to Solve Optimization Problems Constrained with Partial Differential Equations [45.143085119200265]
Partial equation (PDE)-constrained optimization arises in many scientific and engineering domains.<n>This paper introduces a learning-based framework that integrates a dynamic predictor with an optimization surrogate.
arXiv Detail & Related papers (2025-09-29T10:28:14Z) - Learning a Neural Solver for Parametric PDE to Enhance Physics-Informed Methods [14.791541465418263]
We propose learning a solver, i.e., solving partial differential equations (PDEs) using a physics-informed iterative algorithm trained on data.<n>Our method learns to condition a gradient descent algorithm that automatically adapts to each PDE instance.<n>We demonstrate the effectiveness of our approach through empirical experiments on multiple datasets.
arXiv Detail & Related papers (2024-10-09T12:28:32Z) - Enhancing Hypergradients Estimation: A Study of Preconditioning and
Reparameterization [49.73341101297818]
Bilevel optimization aims to optimize an outer objective function that depends on the solution to an inner optimization problem.
The conventional method to compute the so-called hypergradient of the outer problem is to use the Implicit Function Theorem (IFT)
We study the error of the IFT method and analyze two strategies to reduce this error.
arXiv Detail & Related papers (2024-02-26T17:09:18Z) - End-to-End Learning for Fair Multiobjective Optimization Under
Uncertainty [55.04219793298687]
The Predict-Then-Forecast (PtO) paradigm in machine learning aims to maximize downstream decision quality.
This paper extends the PtO methodology to optimization problems with nondifferentiable Ordered Weighted Averaging (OWA) objectives.
It shows how optimization of OWA functions can be effectively integrated with parametric prediction for fair and robust optimization under uncertainty.
arXiv Detail & Related papers (2024-02-12T16:33:35Z) - Efficient PDE-Constrained optimization under high-dimensional
uncertainty using derivative-informed neural operators [6.296120102486062]
We propose a novel framework for solving large-scale partial differential equations (PDEs) with high-dimensional random parameters.
We refer to such neural operators as multi-input reduced basis derivative informed neural operators (MR-DINOs)
We show that MR-DINOs offer $103$--$107 times$ reductions in execution time, and are able to produce OUU solutions of comparable accuracies to those from standard PDE based solutions.
arXiv Detail & Related papers (2023-05-31T17:26:20Z) - Bi-level Physics-Informed Neural Networks for PDE Constrained
Optimization using Broyden's Hypergradients [29.487375792661005]
We present a novel bi-level optimization framework to solve PDE constrained optimization problems.
For the inner loop optimization, we adopt PINNs to solve the PDE constraints only.
For the outer loop, we design a novel method by using Broyden'simat method based on the Implicit Function Theorem.
arXiv Detail & Related papers (2022-09-15T06:21:24Z) - A Globally Convergent Gradient-based Bilevel Hyperparameter Optimization
Method [0.0]
We propose a gradient-based bilevel method for solving the hyperparameter optimization problem.
We show that the proposed method converges with lower computation and leads to models that generalize better on the testing set.
arXiv Detail & Related papers (2022-08-25T14:25:16Z) - Implicit differentiation for fast hyperparameter selection in non-smooth
convex learning [87.60600646105696]
We study first-order methods when the inner optimization problem is convex but non-smooth.
We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian.
arXiv Detail & Related papers (2021-05-04T17:31:28Z) - Speeding up Computational Morphogenesis with Online Neural Synthetic
Gradients [51.42959998304931]
A wide range of modern science and engineering applications are formulated as optimization problems with a system of partial differential equations (PDEs) as constraints.
These PDE-constrained optimization problems are typically solved in a standard discretize-then-optimize approach.
We propose a general framework to speed up PDE-constrained optimization using online neural synthetic gradients (ONSG) with a novel two-scale optimization scheme.
arXiv Detail & Related papers (2021-04-25T22:43:51Z) - Optimizing Large-Scale Hyperparameters via Automated Learning Algorithm [97.66038345864095]
We propose a new hyperparameter optimization method with zeroth-order hyper-gradients (HOZOG)
Specifically, we first formulate hyperparameter optimization as an A-based constrained optimization problem.
Then, we use the average zeroth-order hyper-gradients to update hyper parameters.
arXiv Detail & Related papers (2021-02-17T21:03:05Z) - Proximal Gradient Algorithm with Momentum and Flexible Parameter Restart
for Nonconvex Optimization [73.38702974136102]
Various types of parameter restart schemes have been proposed for accelerated algorithms to facilitate their practical convergence in rates.
In this paper, we propose an algorithm for solving nonsmooth problems.
arXiv Detail & Related papers (2020-02-26T16:06:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.