Controlled Learning of Pointwise Nonlinearities in Neural-Network-Like Architectures
- URL: http://arxiv.org/abs/2408.13114v1
- Date: Fri, 23 Aug 2024 14:39:27 GMT
- Title: Controlled Learning of Pointwise Nonlinearities in Neural-Network-Like Architectures
- Authors: Michael Unser, Alexis Goujon, Stanislas Ducotterd,
- Abstract summary: We present a general variational framework for the training of freeform nonlinearities in layered computational architectures.
The slope constraints allow us to impose properties such as 1-Lipschitz stability, firm non-expansiveness, and monotonicity/invertibility.
We show how to solve the numerically function-optimization problem by representing the nonlinearities in a suitable (nonuniform) B-spline basis.
- Score: 14.93489065234423
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a general variational framework for the training of freeform nonlinearities in layered computational architectures subject to some slope constraints. The regularization that we add to the traditional training loss penalizes the second-order total variation of each trainable activation. The slope constraints allow us to impose properties such as 1-Lipschitz stability, firm non-expansiveness, and monotonicity/invertibility. These properties are crucial to ensure the proper functioning of certain classes of signal-processing algorithms (e.g., plug-and-play schemes, unrolled proximal gradient, invertible flows). We prove that the global optimum of the stated constrained-optimization problem is achieved with nonlinearities that are adaptive nonuniform linear splines. We then show how to solve the resulting function-optimization problem numerically by representing the nonlinearities in a suitable (nonuniform) B-spline basis. Finally, we illustrate the use of our framework with the data-driven design of (weakly) convex regularizers for the denoising of images and the resolution of inverse problems.
Related papers
- GLinSAT: The General Linear Satisfiability Neural Network Layer By Accelerated Gradient Descent [12.409030267572243]
We first reformulate the neural network output projection problem as an entropy-regularized linear programming problem.
Based on an accelerated gradient descent algorithm with numerical performance enhancement, we present our architecture, GLinSAT, to solve the problem.
This is the first general linear satisfiability layer in which all the operations are differentiable and matrix-factorization-free.
arXiv Detail & Related papers (2024-09-26T03:12:53Z) - Gradient-Variation Online Learning under Generalized Smoothness [56.38427425920781]
gradient-variation online learning aims to achieve regret guarantees that scale with variations in gradients of online functions.
Recent efforts in neural network optimization suggest a generalized smoothness condition, allowing smoothness to correlate with gradient norms.
We provide the applications for fast-rate convergence in games and extended adversarial optimization.
arXiv Detail & Related papers (2024-08-17T02:22:08Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Constrained Optimization via Exact Augmented Lagrangian and Randomized
Iterative Sketching [55.28394191394675]
We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems.
We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
arXiv Detail & Related papers (2023-05-28T06:33:37Z) - Optimization-Induced Graph Implicit Nonlinear Diffusion [64.39772634635273]
We propose a new kind of graph convolution variants, called Graph Implicit Diffusion (GIND)
GIND implicitly has access to infinite hops of neighbors while adaptively aggregating features with nonlinear diffusion to prevent over-smoothing.
We show that the learned representation can be formalized as the minimizer of an explicit convex optimization objective.
arXiv Detail & Related papers (2022-06-29T06:26:42Z) - LQF: Linear Quadratic Fine-Tuning [114.3840147070712]
We present the first method for linearizing a pre-trained model that achieves comparable performance to non-linear fine-tuning.
LQF consists of simple modifications to the architecture, loss function and optimization typically used for classification.
arXiv Detail & Related papers (2020-12-21T06:40:20Z) - On the Stability Properties and the Optimization Landscape of Training
Problems with Squared Loss for Neural Networks and General Nonlinear Conic
Approximation Schemes [0.0]
We study the optimization landscape and the stability properties of training problems with squared loss for neural networks and general nonlinear conic approximation schemes.
We prove that the same effects that are responsible for these instability properties are also the reason for the emergence of saddle points and spurious local minima.
arXiv Detail & Related papers (2020-11-06T11:34:59Z) - Ideal formulations for constrained convex optimization problems with
indicator variables [2.578242050187029]
We study the convexification of a class of convex optimization problems with indicator variables and constraints on the indicators.
Unlike most of the previous work on convexification of sparse regression problems, we simultaneously consider the nonlinear non-separable objective, indicator variables, and constraints.
We derive ideal convexifications for problems with hierarchy, multi-collinearity, and sparsity constraints.
arXiv Detail & Related papers (2020-06-30T21:07:10Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.