Sparsest Univariate Learning Models Under Lipschitz Constraint
- URL: http://arxiv.org/abs/2112.13542v1
- Date: Mon, 27 Dec 2021 07:03:43 GMT
- Title: Sparsest Univariate Learning Models Under Lipschitz Constraint
- Authors: Shayan Aziznejad, Thomas Debarre, Michael Unser
- Abstract summary: We propose continuous-domain formulations for one-dimensional regression problems.
We control the Lipschitz constant explicitly using a user-defined upper-bound.
We show that both problems admit global minimizers that are continuous and piecewise-linear.
- Score: 31.28451181040038
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Beside the minimization of the prediction error, two of the most desirable
properties of a regression scheme are stability and interpretability. Driven by
these principles, we propose continuous-domain formulations for one-dimensional
regression problems. In our first approach, we use the Lipschitz constant as a
regularizer, which results in an implicit tuning of the overall robustness of
the learned mapping. In our second approach, we control the Lipschitz constant
explicitly using a user-defined upper-bound and make use of a
sparsity-promoting regularizer to favor simpler (and, hence, more
interpretable) solutions. The theoretical study of the latter formulation is
motivated in part by its equivalence, which we prove, with the training of a
Lipschitz-constrained two-layer univariate neural network with rectified linear
unit (ReLU) activations and weight decay. By proving representer theorems, we
show that both problems admit global minimizers that are continuous and
piecewise-linear (CPWL) functions. Moreover, we propose efficient algorithms
that find the sparsest solution of each problem: the CPWL mapping with the
least number of linear regions. Finally, we illustrate numerically the outcome
of our formulations.
Related papers
- GLinSAT: The General Linear Satisfiability Neural Network Layer By Accelerated Gradient Descent [12.409030267572243]
We first reformulate the neural network output projection problem as an entropy-regularized linear programming problem.
Based on an accelerated gradient descent algorithm with numerical performance enhancement, we present our architecture, GLinSAT, to solve the problem.
This is the first general linear satisfiability layer in which all the operations are differentiable and matrix-factorization-free.
arXiv Detail & Related papers (2024-09-26T03:12:53Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - The Inductive Bias of Flatness Regularization for Deep Matrix
Factorization [58.851514333119255]
This work takes the first step toward understanding the inductive bias of the minimum trace of the Hessian solutions in deep linear networks.
We show that for all depth greater than one, with the standard Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters.
arXiv Detail & Related papers (2023-06-22T23:14:57Z) - Lipschitz Continuity Retained Binary Neural Network [52.17734681659175]
We introduce the Lipschitz continuity as the rigorous criteria to define the model robustness for BNN.
We then propose to retain the Lipschitz continuity as a regularization term to improve the model robustness.
Our experiments prove that our BNN-specific regularization method can effectively strengthen the robustness of BNN.
arXiv Detail & Related papers (2022-07-13T22:55:04Z) - Decentralized Feature-Distributed Optimization for Generalized Linear
Models [19.800898945436384]
We consider the "all-for-one" decentralized learning problem for generalized linear models.
The features of each sample are partitioned among several collaborating agents in a connected network, but only one agent observes the response variables.
We apply the Chambolle--Pock primal--dual algorithm to an equivalent saddle-point formulation of the problem.
arXiv Detail & Related papers (2021-10-28T16:42:47Z) - Reinforcement Learning in Linear MDPs: Constant Regret and
Representation Selection [136.4014229319618]
We study the role of the representation of state-action value functions in regret minimization in finite-horizon Markov Decision Processes (MDPs) with linear structure.
We first derive a necessary condition on the representation, called universally spanning optimal features (UNISOFT), to achieve constant regret in any MDP with linear reward function.
arXiv Detail & Related papers (2021-10-27T22:07:08Z) - Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption.
They can suffer from ill-posedness and convergence instability.
This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z) - Lipschitz Bounded Equilibrium Networks [3.2872586139884623]
This paper introduces new parameterizations of equilibrium neural networks, i.e. networks defined by implicit equations.
The new parameterization admits a Lipschitz bound during training via unconstrained optimization.
In image classification experiments we show that the Lipschitz bounds are very accurate and improve robustness to adversarial attacks.
arXiv Detail & Related papers (2020-10-05T01:00:40Z) - Lipschitz Bounds and Provably Robust Training by Laplacian Smoothing [7.4769019455423855]
We formulate the adversarially robust learning problem as one of loss minimization with a Lipschitz constraint.
We show that the saddle point of the associated Lagrangian is characterized by a Poisson equation with weighted Laplace operator.
We design a provably robust training scheme using graph-based discretization of the input space and a primal-dual algorithm to converge to the Lagrangian's saddle point.
arXiv Detail & Related papers (2020-06-05T22:02:21Z) - Approximation Schemes for ReLU Regression [80.33702497406632]
We consider the fundamental problem of ReLU regression.
The goal is to output the best fitting ReLU with respect to square loss given to draws from some unknown distribution.
arXiv Detail & Related papers (2020-05-26T16:26:17Z) - Deep Neural Networks with Trainable Activations and Controlled Lipschitz
Constant [26.22495169129119]
We introduce a variational framework to learn the activation functions of deep neural networks.
Our aim is to increase the capacity of the network while controlling an upper-bound of the Lipschitz constant.
We numerically compare our scheme with standard ReLU network and its variations, PReLU and LeakyReLU.
arXiv Detail & Related papers (2020-01-17T12:32:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.