On the existence of minimizers in shallow residual ReLU neural network
optimization landscapes
- URL: http://arxiv.org/abs/2302.14690v1
- Date: Tue, 28 Feb 2023 16:01:38 GMT
- Title: On the existence of minimizers in shallow residual ReLU neural network
optimization landscapes
- Authors: Steffen Dereich, Arnulf Jentzen, Sebastian Kassing
- Abstract summary: In practical relevant learning problems, it seems to be advisable to design the ANN architectures in a way so that GD optimization processes remain bounded.
In particular, GD trajectories may escape to infinity if the infimum of the error function (objective function) is not attained in the optimization landscape.
This naturally raises the question of the existence of minimizers in the optimization landscape.
- Score: 2.1055643409860743
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many mathematical convergence results for gradient descent (GD) based
algorithms employ the assumption that the GD process is (almost surely) bounded
and, also in concrete numerical simulations, divergence of the GD process may
slow down, or even completely rule out, convergence of the error function. In
practical relevant learning problems, it thus seems to be advisable to design
the ANN architectures in a way so that GD optimization processes remain
bounded. The property of the boundedness of GD processes for a given learning
problem seems, however, to be closely related to the existence of minimizers in
the optimization landscape and, in particular, GD trajectories may escape to
infinity if the infimum of the error function (objective function) is not
attained in the optimization landscape. This naturally raises the question of
the existence of minimizers in the optimization landscape and, in the situation
of shallow residual ANNs with multi-dimensional input layers and
multi-dimensional hidden layers with the ReLU activation, the main result of
this work answers this question affirmatively for a general class of loss
functions and all continuous target functions. In our proof of this statement,
we propose a kind of closure of the search space, where the limits are called
generalized responses, and, thereafter, we provide sufficient criteria for the
loss function and the underlying probability distribution which ensure that all
additional artificial generalized responses are suboptimal which finally allows
us to conclude the existence of minimizers in the optimization landscape.
Related papers
- Gradient-free neural topology optimization [0.0]
gradient-free algorithms require many more iterations to converge when compared to gradient-based algorithms.
This has made them unviable for topology optimization due to the high computational cost per iteration and high dimensionality of these problems.
We propose a pre-trained neural reparameterization strategy that leads to at least one order of magnitude decrease in iteration count when optimizing the designs in latent space.
arXiv Detail & Related papers (2024-03-07T23:00:49Z) - ProGO: Probabilistic Global Optimizer [9.772380490791635]
In this paper we develop an algorithm that converges to the global optima under some mild conditions.
We show that the proposed algorithm outperforms, by order of magnitude, many existing state-of-the-art methods.
arXiv Detail & Related papers (2023-10-04T22:23:40Z) - Implicit regularization in AI meets generalized hardness of
approximation in optimization -- Sharp results for diagonal linear networks [0.0]
We show sharp results for the implicit regularization imposed by the gradient flow of Diagonal Linear Networks.
We link this to the phenomenon of phase transitions in generalized hardness of approximation.
Non-sharpness of our results would imply that the GHA phenomenon would not occur for the basis pursuit optimization problem.
arXiv Detail & Related papers (2023-07-13T13:27:51Z) - A Particle-based Sparse Gaussian Process Optimizer [5.672919245950197]
We present a new swarm-swarm-based framework utilizing the underlying dynamical process of descent.
The biggest advantage of this approach is greater exploration around the current state before deciding descent descent.
arXiv Detail & Related papers (2022-11-26T09:06:15Z) - Neural Stochastic Dual Dynamic Programming [99.80617899593526]
We introduce a trainable neural model that learns to map problem instances to a piece-wise linear value function.
$nu$-SDDP can significantly reduce problem solving cost without sacrificing solution quality.
arXiv Detail & Related papers (2021-12-01T22:55:23Z) - Fighting the curse of dimensionality: A machine learning approach to
finding global optima [77.34726150561087]
This paper shows how to find global optima in structural optimization problems.
By exploiting certain cost functions we either obtain the global at best or obtain superior results at worst when compared to established optimization procedures.
arXiv Detail & Related papers (2021-10-28T09:50:29Z) - Faster Algorithm and Sharper Analysis for Constrained Markov Decision
Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints.
A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual.
This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z) - Robust Topology Optimization Using Variational Autoencoders [2.580765958706854]
In this work, we use neural network surrogates to enable a faster solution approach via surrogate-based optimization.
We also build a Variational Autoencoder (VAE) to transform the high dimensional design space into a low dimensional one.
The resulting gradient-based optimization algorithm produces optimal designs with lower robust compliances than those observed in the training set.
arXiv Detail & Related papers (2021-07-19T20:40:51Z) - Distributed stochastic optimization with large delays [59.95552973784946]
One of the most widely used methods for solving large-scale optimization problems is distributed asynchronous gradient descent (DASGD)
We show that DASGD converges to a global optimal implementation model under same delay assumptions.
arXiv Detail & Related papers (2021-07-06T21:59:49Z) - Offline Model-Based Optimization via Normalized Maximum Likelihood
Estimation [101.22379613810881]
We consider data-driven optimization problems where one must maximize a function given only queries at a fixed set of points.
This problem setting emerges in many domains where function evaluation is a complex and expensive process.
We propose a tractable approximation that allows us to scale our method to high-capacity neural network models.
arXiv Detail & Related papers (2021-02-16T06:04:27Z) - Exploiting Higher Order Smoothness in Derivative-free Optimization and
Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function.
We consider a randomized approximation of the projected gradient descent algorithm.
Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.