Layerwise goal-oriented adaptivity for neural ODEs: an optimal control perspective
- URL: http://arxiv.org/abs/2601.07397v1
- Date: Mon, 12 Jan 2026 10:32:37 GMT
- Title: Layerwise goal-oriented adaptivity for neural ODEs: an optimal control perspective
- Authors: Michael Hintermüller, Michael Hinze, Denis Korolev,
- Abstract summary: We propose a novel layerwise adaptive construction method for neural network architectures.<n>We present results for a selection of well known examples from the literature.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we propose a novel layerwise adaptive construction method for neural network architectures. Our approach is based on a goal--oriented dual-weighted residual technique for the optimal control of neural differential equations. This leads to an ordinary differential equation constrained optimization problem with controls acting as coefficients and a specific loss function. We implement our approach on the basis of a DG(0) Galerkin discretization of the neural ODE, leading to an explicit Euler time marching scheme. For the optimization we use steepest descent. Finally, we apply our method to the construction of neural networks for the classification of data sets, where we present results for a selection of well known examples from the literature.
Related papers
- Training Neural ODEs Using Fully Discretized Simultaneous Optimization [2.290491821371513]
Training Neural Ordinary Differential Equations (Neural ODEs) requires solving differential equations at each epoch, leading to high computational costs.<n>In particular, we employ a collocation-based, fully discretized formulation and use IPOPT-a solver for large-scale nonlinear optimization.<n>Our results show significant potential for (collocation-based) simultaneous Neural ODE training pipelines.
arXiv Detail & Related papers (2025-02-21T18:10:26Z) - Improving Generalization of Deep Neural Networks by Optimum Shifting [33.092571599896814]
We propose a novel method called emphoptimum shifting, which changes the parameters of a neural network from a sharp minimum to a flatter one.
Our method is based on the observation that when the input and output of a neural network are fixed, the matrix multiplications within the network can be treated as systems of under-determined linear equations.
arXiv Detail & Related papers (2024-05-23T02:31:55Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Towards Theoretically Inspired Neural Initialization Optimization [66.04735385415427]
We propose a differentiable quantity, named GradCosine, with theoretical insights to evaluate the initial state of a neural network.
We show that both the training and test performance of a network can be improved by maximizing GradCosine under norm constraint.
Generalized from the sample-wise analysis into the real batch setting, NIO is able to automatically look for a better initialization with negligible cost.
arXiv Detail & Related papers (2022-10-12T06:49:16Z) - Neural Basis Functions for Accelerating Solutions to High Mach Euler
Equations [63.8376359764052]
We propose an approach to solving partial differential equations (PDEs) using a set of neural networks.
We regress a set of neural networks onto a reduced order Proper Orthogonal Decomposition (POD) basis.
These networks are then used in combination with a branch network that ingests the parameters of the prescribed PDE to compute a reduced order approximation to the PDE.
arXiv Detail & Related papers (2022-08-02T18:27:13Z) - Neural Improvement Heuristics for Graph Combinatorial Optimization
Problems [49.85111302670361]
We introduce a novel Neural Improvement (NI) model capable of handling graph-based problems where information is encoded in the nodes, edges, or both.
The presented model serves as a fundamental component for hill-climbing-based algorithms that guide the selection of neighborhood operations for each.
arXiv Detail & Related papers (2022-06-01T10:35:29Z) - Acceleration techniques for optimization over trained neural network
ensembles [1.0323063834827415]
We study optimization problems where the objective function is modeled through feedforward neural networks with rectified linear unit activation.
We present a mixed-integer linear program based on existing popular big-$M$ formulations for optimizing over a single neural network.
arXiv Detail & Related papers (2021-12-13T20:50:54Z) - Non-Gradient Manifold Neural Network [79.44066256794187]
Deep neural network (DNN) generally takes thousands of iterations to optimize via gradient descent.
We propose a novel manifold neural network based on non-gradient optimization.
arXiv Detail & Related papers (2021-06-15T06:39:13Z) - Meta-Solver for Neural Ordinary Differential Equations [77.8918415523446]
We investigate how the variability in solvers' space can improve neural ODEs performance.
We show that the right choice of solver parameterization can significantly affect neural ODEs models in terms of robustness to adversarial attacks.
arXiv Detail & Related papers (2021-03-15T17:26:34Z) - A Dynamical View on Optimization Algorithms of Overparameterized Neural
Networks [23.038631072178735]
We consider a broad class of optimization algorithms that are commonly used in practice.
As a consequence, we can leverage the convergence behavior of neural networks.
We believe our approach can also be extended to other optimization algorithms and network theory.
arXiv Detail & Related papers (2020-10-25T17:10:22Z) - The Hidden Convex Optimization Landscape of Two-Layer ReLU Neural
Networks: an Exact Characterization of the Optimal Solutions [51.60996023961886]
We prove that finding all globally optimal two-layer ReLU neural networks can be performed by solving a convex optimization program with cone constraints.
Our analysis is novel, characterizes all optimal solutions, and does not leverage duality-based analysis which was recently used to lift neural network training into convex spaces.
arXiv Detail & Related papers (2020-06-10T15:38:30Z) - ODEN: A Framework to Solve Ordinary Differential Equations using
Artificial Neural Networks [0.0]
We prove a specific loss function, which does not require knowledge of the exact solution, to evaluate neural networks' performance.
Neural networks are shown to be proficient at approximating continuous solutions within their training domains.
A user-friendly and adaptable open-source code (ODE$mathcalN$) is provided on GitHub.
arXiv Detail & Related papers (2020-05-28T15:34:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.