Stochastic Langevin Differential Inclusions with Applications to Machine Learning
- URL: http://arxiv.org/abs/2206.11533v3
- Date: Sun, 12 May 2024 14:32:43 GMT
- Title: Stochastic Langevin Differential Inclusions with Applications to Machine Learning
- Authors: Fabio V. Difonzo, Vyacheslav Kungurtsev, Jakub Marecek,
- Abstract summary: We show some foundational results regarding the flow and properties of Langevin-type Differential Inclusions.
In particular, we show strong existence of the solution, as well as an canonical- minimization of the free-energy functional.
- Score: 5.274477003588407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stochastic differential equations of Langevin-diffusion form have received significant attention, thanks to their foundational role in both Bayesian sampling algorithms and optimization in machine learning. In the latter, they serve as a conceptual model of the stochastic gradient flow in training over-parameterized models. However, the literature typically assumes smoothness of the potential, whose gradient is the drift term. Nevertheless, there are many problems for which the potential function is not continuously differentiable, and hence the drift is not Lipschitz continuous everywhere. This is exemplified by robust losses and Rectified Linear Units in regression problems. In this paper, we show some foundational results regarding the flow and asymptotic properties of Langevin-type Stochastic Differential Inclusions under assumptions appropriate to the machine-learning settings. In particular, we show strong existence of the solution, as well as an asymptotic minimization of the canonical free-energy functional.
Related papers
- Limit Theorems for Stochastic Gradient Descent with Infinite Variance [47.87144151929621]
We show that the gradient descent algorithm can be characterized as the stationary distribution of a suitably defined Ornstein-rnstein process driven by an appropriate L'evy process.
We also explore the applications of these results in linear regression and logistic regression models.
arXiv Detail & Related papers (2024-10-21T09:39:10Z) - Generalizing Stochastic Smoothing for Differentiation and Gradient Estimation [59.86921150579892]
We deal with the problem of gradient estimation for differentiable relaxations of algorithms, operators, simulators, and other non-differentiable functions.
We develop variance reduction strategies for differentiable sorting and ranking, differentiable shortest-paths on graphs, differentiable rendering for pose estimation, as well as differentiable cryo-ET simulations.
arXiv Detail & Related papers (2024-10-10T17:10:00Z) - Online Non-Stationary Stochastic Quasar-Convex Optimization [1.9244735303181755]
Recent research has shown that quasar activation function can be found in applications such as identification of linear systems and logistic systems.
We develop algorithms that exploit quasar-inspired design problems in a dynamic environment.
arXiv Detail & Related papers (2024-07-04T03:24:27Z) - BrowNNe: Brownian Nonlocal Neurons & Activation Functions [0.0]
We show that Brownian neural activation functions in low-training data beats the ReLU counterpart.
Our experiments indicate the superior capabilities of Brownian neural activation functions in low-training data.
arXiv Detail & Related papers (2024-06-21T19:40:30Z) - A Functional Model Method for Nonconvex Nonsmooth Conditional Stochastic Optimization [0.0]
We consider optimization problems involving an expected value of a nonlinear function of a base random vector and a conditional expectation of another function depending on the base random vector.
We propose a specialized singlescale method for non constrained learning problems with a smooth outer function and a different conditional inner function.
arXiv Detail & Related papers (2024-05-17T14:35:50Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - High-Probability Bounds for Stochastic Optimization and Variational
Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions.
These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z) - Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with
Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems.
In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - On Uniform Boundedness Properties of SGD and its Momentum Variants [38.41217525394239]
We investigate uniform boundedness properties of iterates and function values along the trajectories of the gradient descent algorithm.
We show that broad families of step-sizes, including the widely used step-decay and cosine with (or without) restart step-sizes, result in uniformly bounded iterates and function values.
arXiv Detail & Related papers (2022-01-25T11:34:56Z) - Nonsmooth Implicit Differentiation for Machine Learning and Optimization [0.0]
In view of training increasingly complex learning architectures, we establish a nonsmooth implicit function theorem with an operational calculus.
Our result applies to most practical problems (i.e., definable problems) provided that a nonsmooth form of the classical invertibility condition is fulfilled.
This approach allows for formal subdifferentiation: for instance, replacing derivatives by Clarke Jacobians in the usual differentiation formulas is fully justified.
arXiv Detail & Related papers (2021-06-08T13:59:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.