Learning Gradients of Convex Functions with Monotone Gradient Networks
- URL: http://arxiv.org/abs/2301.10862v1
- Date: Wed, 25 Jan 2023 23:04:50 GMT
- Title: Learning Gradients of Convex Functions with Monotone Gradient Networks
- Authors: Shreyas Chaudhari, Srinivasa Pranav, Jos\'e M. F. Moura
- Abstract summary: gradients of convex functions have critical applications ranging from gradient-based optimization to optimal transport.
Recent works have explored data-driven methods for learning convex objectives, but learning their monotone gradients is seldom studied.
We show that our networks are simpler to train, learn monotone gradient fields more accurately, and use significantly fewer parameters than state of the art methods.
- Score: 5.220940151628734
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: While much effort has been devoted to deriving and studying effective convex
formulations of signal processing problems, the gradients of convex functions
also have critical applications ranging from gradient-based optimization to
optimal transport. Recent works have explored data-driven methods for learning
convex objectives, but learning their monotone gradients is seldom studied. In
this work, we propose Cascaded and Modular Monotone Gradient Networks (C-MGN
and M-MGN respectively), two monotone gradient neural network architectures for
directly learning the gradients of convex functions. We show that our networks
are simpler to train, learn monotone gradient fields more accurately, and use
significantly fewer parameters than state of the art methods. We further
demonstrate their ability to learn optimal transport mappings to augment
driving image data.
Related papers
- Dimer-Enhanced Optimization: A First-Order Approach to Escaping Saddle Points in Neural Network Training [5.9408311406202285]
Dimer method is a first-order technique that constructs two closely spaced points to probe the local geometry of a potential energy surface.<n>Inspired by its use in molecular dynamics simulations for locating saddle points, we propose Dimer-Enhanced Optimization.<n>DEO guides the away from saddle points and flat regions, enhancing training efficiency with non-stepwise updates.
arXiv Detail & Related papers (2025-07-26T14:57:32Z) - GradNetOT: Learning Optimal Transport Maps with GradNets [11.930694410868435]
In [arXiv:2301.10862] [arXiv:2404.07361], we proposed Monotone Gradient Networks (mGradNets), neural networks that directly parameterize the space of monotone gradient maps.<n>We empirically show that the structural bias of mGradNets facilitates the learning of optimal transport maps and employ our method for a robot swarm control problem.
arXiv Detail & Related papers (2025-07-17T14:59:24Z) - GradMetaNet: An Equivariant Architecture for Learning on Gradients [18.350495600116712]
We introduce GradMetaNet, a novel architecture for learning on gradients.<n>We prove results for GradMetaNet, and show that previous approaches cannot approximate natural gradient-based functions.<n>We then demonstrate GradMetaNet's effectiveness on a diverse set of gradient-based tasks.
arXiv Detail & Related papers (2025-07-02T12:22:39Z) - Optimistic Gradient Learning with Hessian Corrections for High-Dimensional Black-Box Optimization [14.073853819633745]
Black-box algorithms are designed to optimize functions without relying on their underlying analytical structure or gradient information.
We propose two novel gradient learning variants to address the challenges posed by high-dimensional, complex, and highly non-linear problems.
arXiv Detail & Related papers (2025-02-07T11:03:50Z) - Fast and Slow Gradient Approximation for Binary Neural Network Optimization [11.064044986709733]
hypernetwork based methods utilize neural networks to learn the gradients of non-differentiable quantization functions.
We propose a Historical Gradient Storage (HGS) module, which models the historical gradient sequence to generate the first-order momentum required for optimization.
We also introduce Layer Recognition Embeddings (LRE) into the hypernetwork, facilitating the generation of layer-specific fine gradients.
arXiv Detail & Related papers (2024-12-16T13:48:40Z) - Gradient Networks [11.930694410868435]
We provide a comprehensive GradNet design framework to represent convex gradients.
We show that GradNets can approximate neural gradient functions.
We also show that monotone GradNets provide efficient parameterizations and outperform existing methods.
arXiv Detail & Related papers (2024-04-10T21:36:59Z) - Continuous-Time Meta-Learning with Forward Mode Differentiation [65.26189016950343]
We introduce Continuous Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field.
Treating the learning process as an ODE offers the notable advantage that the length of the trajectory is now continuous.
We show empirically its efficiency in terms of runtime and memory usage, and we illustrate its effectiveness on a range of few-shot image classification problems.
arXiv Detail & Related papers (2022-03-02T22:35:58Z) - Can we learn gradients by Hamiltonian Neural Networks? [68.8204255655161]
We propose a meta-learner based on ODE neural networks that learns gradients.
We demonstrate that our method outperforms a meta-learner based on LSTM for an artificial task and the MNIST dataset with ReLU activations in the optimizee.
arXiv Detail & Related papers (2021-10-31T18:35:10Z) - Efficient Differentiable Simulation of Articulated Bodies [89.64118042429287]
We present a method for efficient differentiable simulation of articulated bodies.
This enables integration of articulated body dynamics into deep learning frameworks.
We show that reinforcement learning with articulated systems can be accelerated using gradients provided by our method.
arXiv Detail & Related papers (2021-09-16T04:48:13Z) - Adapting Stepsizes by Momentumized Gradients Improves Optimization and
Generalization [89.66571637204012]
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
arXiv Detail & Related papers (2021-06-22T03:13:23Z) - Cogradient Descent for Dependable Learning [64.02052988844301]
We propose a dependable learning based on Cogradient Descent (CoGD) algorithm to address the bilinear optimization problem.
CoGD is introduced to solve bilinear problems when one variable is with sparsity constraint.
It can also be used to decompose the association of features and weights, which further generalizes our method to better train convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-06-20T04:28:20Z) - Channel-Directed Gradients for Optimization of Convolutional Neural
Networks [50.34913837546743]
We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error.
We show that defining the gradients along the output channel direction leads to a performance boost, while other directions can be detrimental.
arXiv Detail & Related papers (2020-08-25T00:44:09Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.