A Globally Convergent Algorithm for Neural Network Parameter
Optimization Based on Difference-of-Convex Functions
- URL: http://arxiv.org/abs/2401.07936v1
- Date: Mon, 15 Jan 2024 19:53:35 GMT
- Title: A Globally Convergent Algorithm for Neural Network Parameter
Optimization Based on Difference-of-Convex Functions
- Authors: Daniel Tschernutter, Mathias Kraus, Stefan Feuerriegel
- Abstract summary: We propose an algorithm for optimizing parameters of hidden layer networks.
Specifically, we derive a blockwise (DC-of-the-art) difference function.
- Score: 29.58728073957055
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose an algorithm for optimizing the parameters of single hidden layer
neural networks. Specifically, we derive a blockwise difference-of-convex (DC)
functions representation of the objective function. Based on the latter, we
propose a block coordinate descent (BCD) approach that we combine with a
tailored difference-of-convex functions algorithm (DCA). We prove global
convergence of the proposed algorithm. Furthermore, we mathematically analyze
the convergence rate of parameters and the convergence rate in value (i.e., the
training loss). We give conditions under which our algorithm converges linearly
or even faster depending on the local shape of the loss function. We confirm
our theoretical derivations numerically and compare our algorithm against
state-of-the-art gradient-based solvers in terms of both training loss and test
loss.
Related papers
- Convergence of Implicit Gradient Descent for Training Two-Layer Physics-Informed Neural Networks [3.680127959836384]
implicit gradient descent (IGD) outperforms the common gradient descent (GD) in handling certain multi-scale problems.
We show that IGD converges a globally optimal solution at a linear convergence rate.
arXiv Detail & Related papers (2024-07-03T06:10:41Z) - Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Linearization Algorithms for Fully Composite Optimization [61.20539085730636]
This paper studies first-order algorithms for solving fully composite optimization problems convex compact sets.
We leverage the structure of the objective by handling differentiable and non-differentiable separately, linearizing only the smooth parts.
arXiv Detail & Related papers (2023-02-24T18:41:48Z) - An Algebraically Converging Stochastic Gradient Descent Algorithm for
Global Optimization [14.336473214524663]
A key component in the algorithm is the randomness based on the value of the objective function.
We prove the convergence of the algorithm with an algebra and tuning in the parameter space.
We present several numerical examples to demonstrate the efficiency and robustness of the algorithm.
arXiv Detail & Related papers (2022-04-12T16:27:49Z) - Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex
Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network.
We design two optimal algorithms that attain these lower bounds.
We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z) - A Dynamical View on Optimization Algorithms of Overparameterized Neural
Networks [23.038631072178735]
We consider a broad class of optimization algorithms that are commonly used in practice.
As a consequence, we can leverage the convergence behavior of neural networks.
We believe our approach can also be extended to other optimization algorithms and network theory.
arXiv Detail & Related papers (2020-10-25T17:10:22Z) - Aligning Partially Overlapping Point Sets: an Inner Approximation
Algorithm [80.15123031136564]
We propose a robust method to align point sets where there is no prior information about the value of the transformation.
Our algorithm does not need regularization on transformation, and thus can handle the situation where there is no prior information about the values of the transformations.
Experimental results demonstrate the better robustness of the proposed method over state-of-the-art algorithms.
arXiv Detail & Related papers (2020-07-05T15:23:33Z) - Provably Convergent Working Set Algorithm for Non-Convex Regularized
Regression [0.0]
This paper proposes a working set algorithm for non-regular regularizers with convergence guarantees.
Our results demonstrate high gain compared to the full problem solver for both block-coordinates or a gradient solver.
arXiv Detail & Related papers (2020-06-24T07:40:31Z) - Optimal and Practical Algorithms for Smooth and Strongly Convex
Decentralized Optimization [21.555331273873175]
We consider the task of decentralized minimization of the sum of smooth strongly convex functions stored across the nodes of a network.
We propose two new algorithms for this decentralized optimization problem and equip them with complexity guarantees.
arXiv Detail & Related papers (2020-06-21T11:23:20Z) - Exploiting Higher Order Smoothness in Derivative-free Optimization and
Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function.
We consider a randomized approximation of the projected gradient descent algorithm.
Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.