Differentiable Programming \`a la Moreau
- URL: http://arxiv.org/abs/2012.15458v1
- Date: Thu, 31 Dec 2020 05:56:51 GMT
- Title: Differentiable Programming \`a la Moreau
- Authors: Vincent Roulet and Zaid Harchaoui
- Abstract summary: We define a compositional calculus adapted to Moreau envelopes and show how to integrate it within differentiable programming.
The proposed framework casts in a mathematical optimization framework several variants of gradient back-propagation related to the idea of the propagation of virtual targets.
- Score: 4.289574109162585
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The notion of a Moreau envelope is central to the analysis of first-order
optimization algorithms for machine learning. Yet, it has not been developed
and extended to be applied to a deep network and, more broadly, to a machine
learning system with a differentiable programming implementation. We define a
compositional calculus adapted to Moreau envelopes and show how to integrate it
within differentiable programming. The proposed framework casts in a
mathematical optimization framework several variants of gradient
back-propagation related to the idea of the propagation of virtual targets.
Related papers
- The Elements of Differentiable Programming [14.197724178748176]
Differentiable programming enables end-to-end differentiation of complex computer programs.
Differentiable programming builds upon several areas of computer science and applied mathematics.
arXiv Detail & Related papers (2024-03-21T17:55:16Z) - Fast and Scalable Network Slicing by Integrating Deep Learning with
Lagrangian Methods [8.72339110741777]
Network slicing is a key technique in 5G and beyond for efficiently supporting diverse services.
Deep learning models suffer limited generalization and adaptability to dynamic slicing configurations.
We propose a novel framework that integrates constrained optimization methods and deep learning models.
arXiv Detail & Related papers (2024-01-22T07:19:16Z) - A Gentle Introduction to Gradient-Based Optimization and Variational
Inequalities for Machine Learning [46.98201017084005]
We provide a framework for gradient-based algorithms in machine learning.
We start with saddle points and monotone games, and proceed to general variational inequalities.
While we provide convergence proofs for several of the algorithms, our main focus is that of providing motivation and intuition.
arXiv Detail & Related papers (2023-09-09T21:36:51Z) - Branches of a Tree: Taking Derivatives of Programs with Discrete and
Branching Randomness in High Energy Physics [1.0587959762260988]
We discuss several possible gradient estimation strategies, including the recent AD method, and compare them in simplified detector design experiments.
In doing so we develop, to the best of our knowledge, the first fully differentiable branching program.
arXiv Detail & Related papers (2023-08-31T12:32:34Z) - On the Convergence of Distributed Stochastic Bilevel Optimization
Algorithms over a Network [55.56019538079826]
Bilevel optimization has been applied to a wide variety of machine learning models.
Most existing algorithms restrict their single-machine setting so that they are incapable of handling distributed data.
We develop novel decentralized bilevel optimization algorithms based on a gradient tracking communication mechanism and two different gradients.
arXiv Detail & Related papers (2022-06-30T05:29:52Z) - Quasi Black-Box Variational Inference with Natural Gradients for
Bayesian Learning [84.90242084523565]
We develop an optimization algorithm suitable for Bayesian learning in complex models.
Our approach relies on natural gradient updates within a general black-box framework for efficient training with limited model-specific derivations.
arXiv Detail & Related papers (2022-05-23T18:54:27Z) - Model-Based Deep Learning: On the Intersection of Deep Learning and
Optimization [101.32332941117271]
Decision making algorithms are used in a multitude of different applications.
Deep learning approaches that use highly parametric architectures tuned from data without relying on mathematical models are becoming increasingly popular.
Model-based optimization and data-centric deep learning are often considered to be distinct disciplines.
arXiv Detail & Related papers (2022-05-05T13:40:08Z) - Differentiable Spline Approximations [48.10988598845873]
Differentiable programming has significantly enhanced the scope of machine learning.
Standard differentiable programming methods (such as autodiff) typically require that the machine learning models be differentiable.
We show that leveraging this redesigned Jacobian in the form of a differentiable "layer" in predictive models leads to improved performance in diverse applications.
arXiv Detail & Related papers (2021-10-04T16:04:46Z) - A Flexible Framework for Designing Trainable Priors with Adaptive
Smoothing and Game Encoding [57.1077544780653]
We introduce a general framework for designing and training neural network layers whose forward passes can be interpreted as solving non-smooth convex optimization problems.
We focus on convex games, solved by local agents represented by the nodes of a graph and interacting through regularization functions.
This approach is appealing for solving imaging problems, as it allows the use of classical image priors within deep models that are trainable end to end.
arXiv Detail & Related papers (2020-06-26T08:34:54Z) - An Elementary Approach to Convergence Guarantees of Optimization
Algorithms for Deep Networks [2.715884199292287]
We present an approach to obtain convergence guarantees of optimization algorithms for deep networks based on oracle elementary arguments and computations.
We provide a systematic way to compute estimates of the smoothness constants that govern the convergence behavior of first-order optimization algorithms used to train deep networks.
arXiv Detail & Related papers (2020-02-20T22:40:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.