Nonsmooth Implicit Differentiation for Machine Learning and Optimization
- URL: http://arxiv.org/abs/2106.04350v1
- Date: Tue, 8 Jun 2021 13:59:47 GMT
- Title: Nonsmooth Implicit Differentiation for Machine Learning and Optimization
- Authors: J\'er\^ome Bolte (TSE), Tam Le (TSE), Edouard Pauwels (IRIT), Antonio
Silveti-Falls (TSE)
- Abstract summary: In view of training increasingly complex learning architectures, we establish a nonsmooth implicit function theorem with an operational calculus.
Our result applies to most practical problems (i.e., definable problems) provided that a nonsmooth form of the classical invertibility condition is fulfilled.
This approach allows for formal subdifferentiation: for instance, replacing derivatives by Clarke Jacobians in the usual differentiation formulas is fully justified.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In view of training increasingly complex learning architectures, we establish
a nonsmooth implicit function theorem with an operational calculus. Our result
applies to most practical problems (i.e., definable problems) provided that a
nonsmooth form of the classical invertibility condition is fulfilled. This
approach allows for formal subdifferentiation: for instance, replacing
derivatives by Clarke Jacobians in the usual differentiation formulas is fully
justified for a wide class of nonsmooth problems. Moreover this calculus is
entirely compatible with algorithmic differentiation (e.g., backpropagation).
We provide several applications such as training deep equilibrium networks,
training neural nets with conic optimization layers, or hyperparameter-tuning
for nonsmooth Lasso-type models. To show the sharpness of our assumptions, we
present numerical experiments showcasing the extremely pathological gradient
dynamics one can encounter when applying implicit algorithmic differentiation
without any hypothesis.
Related papers
- A Functional Model Method for Nonconvex Nonsmooth Conditional Stochastic Optimization [0.0]
We consider optimization problems involving an expected value of a nonlinear function of a base random vector and a conditional expectation of another function depending on the base random vector.
We propose a specialized singlescale method for non constrained learning problems with a smooth outer function and a different conditional inner function.
arXiv Detail & Related papers (2024-05-17T14:35:50Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Linearization Algorithms for Fully Composite Optimization [61.20539085730636]
This paper studies first-order algorithms for solving fully composite optimization problems convex compact sets.
We leverage the structure of the objective by handling differentiable and non-differentiable separately, linearizing only the smooth parts.
arXiv Detail & Related papers (2023-02-24T18:41:48Z) - Learning Globally Smooth Functions on Manifolds [94.22412028413102]
Learning smooth functions is generally challenging, except in simple cases such as learning linear or kernel models.
This work proposes to overcome these obstacles by combining techniques from semi-infinite constrained learning and manifold regularization.
We prove that, under mild conditions, this method estimates the Lipschitz constant of the solution, learning a globally smooth solution as a byproduct.
arXiv Detail & Related papers (2022-10-01T15:45:35Z) - Stochastic Langevin Differential Inclusions with Applications to Machine Learning [5.274477003588407]
We show some foundational results regarding the flow and properties of Langevin-type Differential Inclusions.
In particular, we show strong existence of the solution, as well as an canonical- minimization of the free-energy functional.
arXiv Detail & Related papers (2022-06-23T08:29:17Z) - SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients [99.13839450032408]
It is desired to design a universal framework for adaptive algorithms to solve general problems.
In particular, our novel framework provides adaptive methods under non convergence support for setting.
arXiv Detail & Related papers (2021-06-15T15:16:28Z) - Efficient and Modular Implicit Differentiation [68.74748174316989]
We propose a unified, efficient and modular approach for implicit differentiation of optimization problems.
We show that seemingly simple principles allow to recover many recently proposed implicit differentiation methods and create new ones easily.
arXiv Detail & Related papers (2021-05-31T17:45:58Z) - Implicit differentiation for fast hyperparameter selection in non-smooth
convex learning [87.60600646105696]
We study first-order methods when the inner optimization problem is convex but non-smooth.
We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian.
arXiv Detail & Related papers (2021-05-04T17:31:28Z) - Convergence of a Stochastic Gradient Method with Momentum for Non-Smooth
Non-Convex Optimization [25.680334940504405]
This paper establishes the convergence of the rate of a non-smooth subient method with momentum for constrained problems.
For problems, we show how the unconstrained case can be analyzed under weaker assumptions than the state-of-the-art.
arXiv Detail & Related papers (2020-02-13T12:10:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.