Learning via nonlinear conjugate gradients and depth-varying neural ODEs
- URL: http://arxiv.org/abs/2202.05766v1
- Date: Fri, 11 Feb 2022 17:00:48 GMT
- Title: Learning via nonlinear conjugate gradients and depth-varying neural ODEs
- Authors: George Baravdish, Gabriel Eilertsen, Rym Jaroudi, B. Tomas Johansson,
Luk\'a\v{s} Mal\'y and Jonas Unger
- Abstract summary: The inverse problem of supervised reconstruction of depth-variable parameters in a neural ordinary differential equation (NODE) is considered.
The proposed parameter reconstruction is done for a general first order differential equation by minimizing a cost functional.
The sensitivity problem can estimate changes in the network output under perturbation of the trained parameters.
- Score: 5.565364597145568
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The inverse problem of supervised reconstruction of depth-variable
(time-dependent) parameters in a neural ordinary differential equation (NODE)
is considered, that means finding the weights of a residual network with time
continuous layers. The NODE is treated as an isolated entity describing the
full network as opposed to earlier research, which embedded it between pre- and
post-appended layers trained by conventional methods. The proposed parameter
reconstruction is done for a general first order differential equation by
minimizing a cost functional covering a variety of loss functions and penalty
terms. A nonlinear conjugate gradient method (NCG) is derived for the
minimization. Mathematical properties are stated for the differential equation
and the cost functional. The adjoint problem needed is derived together with a
sensitivity problem. The sensitivity problem can estimate changes in the
network output under perturbation of the trained parameters. To preserve
smoothness during the iterations the Sobolev gradient is calculated and
incorporated. As a proof-of-concept, numerical results are included for a NODE
and two synthetic datasets, and compared with standard gradient approaches (not
based on NODEs). The results show that the proposed method works well for deep
learning with infinite numbers of layers, and has built-in stability and
smoothness.
Related papers
- A Natural Primal-Dual Hybrid Gradient Method for Adversarial Neural Network Training on Solving Partial Differential Equations [9.588717577573684]
We propose a scalable preconditioned primal hybrid gradient algorithm for solving partial differential equations (PDEs)
We compare the performance of the proposed method with several commonly used deep learning algorithms.
The numerical results suggest that the proposed method performs efficiently and robustly and converges more stably.
arXiv Detail & Related papers (2024-11-09T20:39:10Z) - FEM-based Neural Networks for Solving Incompressible Fluid Flows and Related Inverse Problems [41.94295877935867]
numerical simulation and optimization of technical systems described by partial differential equations is expensive.
A comparatively new approach in this context is to combine the good approximation properties of neural networks with the classical finite element method.
In this paper, we extend this approach to saddle-point and non-linear fluid dynamics problems, respectively.
arXiv Detail & Related papers (2024-09-06T07:17:01Z) - A Nonoverlapping Domain Decomposition Method for Extreme Learning Machines: Elliptic Problems [0.0]
Extreme learning machine (ELM) is a methodology for solving partial differential equations (PDEs) using a single hidden layer feed-forward neural network.
In this paper, we propose a nonoverlapping domain decomposition method (DDM) for ELMs that not only reduces the training time of ELMs, but is also suitable for parallel computation.
arXiv Detail & Related papers (2024-06-22T23:25:54Z) - Learning Discretized Neural Networks under Ricci Flow [51.36292559262042]
We study Discretized Neural Networks (DNNs) composed of low-precision weights and activations.
DNNs suffer from either infinite or zero gradients due to the non-differentiable discrete function during training.
arXiv Detail & Related papers (2023-02-07T10:51:53Z) - Message Passing Neural PDE Solvers [60.77761603258397]
We build a neural message passing solver, replacing allally designed components in the graph with backprop-optimized neural function approximators.
We show that neural message passing solvers representationally contain some classical methods, such as finite differences, finite volumes, and WENO schemes.
We validate our method on various fluid-like flow problems, demonstrating fast, stable, and accurate performance across different domain topologies, equation parameters, discretizations, etc., in 1D and 2D.
arXiv Detail & Related papers (2022-02-07T17:47:46Z) - Improved Overparametrization Bounds for Global Convergence of Stochastic
Gradient Descent for Shallow Neural Networks [1.14219428942199]
We study the overparametrization bounds required for the global convergence of gradient descent algorithm for a class of one hidden layer feed-forward neural networks.
arXiv Detail & Related papers (2022-01-28T11:30:06Z) - Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent.
We show that SGD is biased towards a simple solution.
We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z) - Cogradient Descent for Dependable Learning [64.02052988844301]
We propose a dependable learning based on Cogradient Descent (CoGD) algorithm to address the bilinear optimization problem.
CoGD is introduced to solve bilinear problems when one variable is with sparsity constraint.
It can also be used to decompose the association of features and weights, which further generalizes our method to better train convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-06-20T04:28:20Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z) - Solving inverse-PDE problems with physics-aware neural networks [0.0]
We propose a novel framework to find unknown fields in the context of inverse problems for partial differential equations.
We blend the high expressibility of deep neural networks as universal function estimators with the accuracy and reliability of existing numerical algorithms.
arXiv Detail & Related papers (2020-01-10T18:46:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.