On the Identification and Optimization of Nonsmooth Superposition
Operators in Semilinear Elliptic PDEs
- URL: http://arxiv.org/abs/2306.05185v2
- Date: Fri, 2 Feb 2024 16:37:31 GMT
- Title: On the Identification and Optimization of Nonsmooth Superposition
Operators in Semilinear Elliptic PDEs
- Authors: Constantin Christof and Julia Kowalczyk
- Abstract summary: We study an infinite-dimensional optimization problem that aims to identify the Nemytskii operator in the nonlinear part of a prototypical semilinear elliptic partial differential equation (PDE)
In contrast to previous works, we consider this identification problem in a low-regularity regime in which the function inducing the Nemytskii operator is a-priori only known to be an element of $H leakyloc(mathbbR)$.
- Score: 3.045851438458641
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study an infinite-dimensional optimization problem that aims to identify
the Nemytskii operator in the nonlinear part of a prototypical semilinear
elliptic partial differential equation (PDE) which minimizes the distance
between the PDE-solution and a given desired state. In contrast to previous
works, we consider this identification problem in a low-regularity regime in
which the function inducing the Nemytskii operator is a-priori only known to be
an element of $H^1_{loc}(\mathbb{R})$. This makes the studied problem class a
suitable point of departure for the rigorous analysis of training problems for
learning-informed PDEs in which an unknown superposition operator is
approximated by means of a neural network with nonsmooth activation functions
(ReLU, leaky-ReLU, etc.). We establish that, despite the low regularity of the
controls, it is possible to derive a classical stationarity system for local
minimizers and to solve the considered problem by means of a gradient
projection method. The convergence of the resulting algorithm is proven in the
function space setting. It is also shown that the established first-order
necessary optimality conditions imply that locally optimal superposition
operators share various characteristic properties with commonly used activation
functions: They are always sigmoidal, continuously differentiable away from the
origin, and typically possess a distinct kink at zero. The paper concludes with
numerical experiments which confirm the theoretical findings.
Related papers
- Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems.
Such problems are encountered in medicine, physics, and machine learning.
We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z) - Finite Operator Learning: Bridging Neural Operators and Numerical Methods for Efficient Parametric Solution and Optimization of PDEs [0.0]
We introduce a method that combines neural operators, physics-informed machine learning, and standard numerical methods for solving PDEs.
We can parametrically solve partial differential equations in a data-free manner and provide accurate sensitivities.
Our study focuses on the steady-state heat equation within heterogeneous materials.
arXiv Detail & Related papers (2024-07-04T21:23:12Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - Learning via nonlinear conjugate gradients and depth-varying neural ODEs [5.565364597145568]
The inverse problem of supervised reconstruction of depth-variable parameters in a neural ordinary differential equation (NODE) is considered.
The proposed parameter reconstruction is done for a general first order differential equation by minimizing a cost functional.
The sensitivity problem can estimate changes in the network output under perturbation of the trained parameters.
arXiv Detail & Related papers (2022-02-11T17:00:48Z) - Message Passing Neural PDE Solvers [60.77761603258397]
We build a neural message passing solver, replacing allally designed components in the graph with backprop-optimized neural function approximators.
We show that neural message passing solvers representationally contain some classical methods, such as finite differences, finite volumes, and WENO schemes.
We validate our method on various fluid-like flow problems, demonstrating fast, stable, and accurate performance across different domain topologies, equation parameters, discretizations, etc., in 1D and 2D.
arXiv Detail & Related papers (2022-02-07T17:47:46Z) - A proof of convergence for the gradient descent optimization method with
random initializations in the training of neural networks with ReLU
activation for piecewise linear target functions [3.198144010381572]
Gradient descent (GD) type optimization methods are the standard instrument to train artificial neural networks (ANNs) with rectified linear unit (ReLU) activation.
arXiv Detail & Related papers (2021-08-10T12:01:37Z) - An Operator-Splitting Method for the Gaussian Curvature Regularization
Model with Applications in Surface Smoothing and Imaging [6.860238280163609]
We propose an operator-splitting method for a general Gaussian curvature model.
The proposed method is not sensitive to the choice of parameters, its efficiency and performances being demonstrated.
arXiv Detail & Related papers (2021-08-04T08:59:41Z) - Solving high-dimensional eigenvalue problems using deep neural networks:
A diffusion Monte Carlo like approach [14.558626910178127]
The eigenvalue problem is reformulated as a fixed point problem of the semigroup flow induced by the operator.
The method shares a similar spirit with diffusion Monte Carlo but augments a direct approximation to the eigenfunction through neural-network ansatz.
Our approach is able to provide accurate eigenvalue and eigenfunction approximations in several numerical examples.
arXiv Detail & Related papers (2020-02-07T03:08:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.