MinMax Networks
- URL: http://arxiv.org/abs/2306.09253v1
- Date: Thu, 15 Jun 2023 16:30:33 GMT
- Title: MinMax Networks
- Authors: Winfried Lohmiller, Philipp Gassert, Jean-Jacques Slotine
- Abstract summary: This paper describes an alternative discrete MinMax learning approach for continuous piece-wise linear functions.
We show that the convergence rate of a constrained piece-wise linear function learning is equivalent to the exponential convergence rates of the individual local linear regions.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While much progress has been achieved over the last decades in neuro-inspired
machine learning, there are still fundamental theoretical problems in
gradient-based learning using combinations of neurons. These problems, such as
saddle points and suboptimal plateaus of the cost function, can lead in theory
and practice to failures of learning. In addition, the discrete step size
selection of the gradient is problematic since too large steps can lead to
instability and too small steps slow down the learning. This paper describes an
alternative discrete MinMax learning approach for continuous piece-wise linear
functions. Global exponential convergence of the algorithm is established using
Contraction Theory with Inequality Constraints, which is extended from the
continuous to the discrete case in this paper:
The parametrization of each linear function piece is, in contrast to deep
learning, linear in the proposed MinMax network. This allows a linear
regression stability proof as long as measurements do not transit from one
linear region to its neighbouring linear region.
The step size of the discrete gradient descent is Lagrangian limited
orthogonal to the edge of two neighbouring linear functions. It will be shown
that this Lagrangian step limitation does not decrease the convergence of the
unconstrained system dynamics in contrast to a step size limitation in the
direction of the gradient.
We show that the convergence rate of a constrained piece-wise linear function
learning is equivalent to the exponential convergence rates of the individual
local linear regions.
Related papers
- Gradient descent with adaptive stepsize converges (nearly) linearly under fourth-order growth [12.452887246184318]
We show that gradient descent with an adaptive stepsize converges at a local (nearly) linear rate on any smooth function.
The adaptive stepsize we propose arises from an intriguing decomposition theorem.
arXiv Detail & Related papers (2024-09-29T21:27:00Z) - On the Convergence of Gradient Descent for Large Learning Rates [55.33626480243135]
We show that convergence is impossible when a fixed step size is used.
We provide a proof of this in the case of linear neural networks with a squared loss.
We also prove the impossibility of convergence for more general losses without requiring strong assumptions such as Lipschitz continuity for the gradient.
arXiv Detail & Related papers (2024-02-20T16:01:42Z) - Constrained Optimization via Exact Augmented Lagrangian and Randomized
Iterative Sketching [55.28394191394675]
We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems.
We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
arXiv Detail & Related papers (2023-05-28T06:33:37Z) - Beyond the Edge of Stability via Two-step Gradient Updates [49.03389279816152]
Gradient Descent (GD) is a powerful workhorse of modern machine learning.
GD's ability to find local minimisers is only guaranteed for losses with Lipschitz gradients.
This work focuses on simple, yet representative, learning problems via analysis of two-step gradient updates.
arXiv Detail & Related papers (2022-06-08T21:32:50Z) - Deep Learning Approximation of Diffeomorphisms via Linear-Control
Systems [91.3755431537592]
We consider a control system of the form $dot x = sum_i=1lF_i(x)u_i$, with linear dependence in the controls.
We use the corresponding flow to approximate the action of a diffeomorphism on a compact ensemble of points.
arXiv Detail & Related papers (2021-10-24T08:57:46Z) - A Stochastic Composite Augmented Lagrangian Method For Reinforcement
Learning [9.204659134755795]
We consider the linear programming (LP) formulation for deep reinforcement learning.
The augmented Lagrangian method suffers the double-sampling obstacle in solving the LP.
A deep parameterized augment Lagrangian method is proposed.
arXiv Detail & Related papers (2021-05-20T13:08:06Z) - Infinitesimal gradient boosting [0.0]
We define infinitesimal gradient boosting as a limit of the popular tree-based gradient boosting algorithm from machine learning.
We introduce a new class of randomized regression trees bridging totally randomized trees and Extra Trees.
arXiv Detail & Related papers (2021-04-26T15:09:05Z) - Limiting Behaviors of Nonconvex-Nonconcave Minimax Optimization via
Continuous-Time Systems [10.112779201155005]
We study the limiting behaviors of three classic minimax algorithms: gradient descent (AGDA), ascentGDA, and the extragradient method (EGM)
Numerically, we observe that all limiting behaviors can arise in Generative Adrial Networks (GAN) and are easily demonstrated for a range of GAN problems.
arXiv Detail & Related papers (2020-10-20T21:14:51Z) - Conditional gradient methods for stochastically constrained convex
minimization [54.53786593679331]
We propose two novel conditional gradient-based methods for solving structured convex optimization problems.
The most important feature of our framework is that only a subset of the constraints is processed at each iteration.
Our algorithms rely on variance reduction and smoothing used in conjunction with conditional gradient steps, and are accompanied by rigorous convergence guarantees.
arXiv Detail & Related papers (2020-07-07T21:26:35Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z) - Linear Regression without Correspondences via Concave Minimization [24.823689223437917]
A signal is recovered in a linear regression setting without correspondences.
The associated maximum likelihood function is NP-hard to compute when the signal has dimension larger than one.
We reformulate it as a concave minimization problem, which we solve via branch-and-bound.
The resulting algorithm outperforms state-of-the-art methods for fully shuffled data and remains tractable for up to $8$-dimensional signals.
arXiv Detail & Related papers (2020-03-17T13:19:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.