Related papers: A Gentle Introduction to Gradient-Based Optimization and Variational Inequalities for Machine Learning

A Gentle Introduction to Gradient-Based Optimization and Variational Inequalities for Machine Learning

URL: http://arxiv.org/abs/2309.04877v2
Date: Mon, 26 Feb 2024 05:05:42 GMT
Title: A Gentle Introduction to Gradient-Based Optimization and Variational Inequalities for Machine Learning
Authors: Neha S. Wadia, Yatin Dandi, and Michael I. Jordan
Abstract summary: We provide a framework for gradient-based algorithms in machine learning. We start with saddle points and monotone games, and proceed to general variational inequalities. While we provide convergence proofs for several of the algorithms, our main focus is that of providing motivation and intuition.
Score: 46.98201017084005
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rapid progress in machine learning in recent years has been based on a highly productive connection to gradient-based optimization. Further progress hinges in part on a shift in focus from pattern recognition to decision-making and multi-agent problems. In these broader settings, new mathematical challenges emerge that involve equilibria and game theory instead of optima. Gradient-based methods remain essential -- given the high dimensionality and large scale of machine-learning problems -- but simple gradient descent is no longer the point of departure for algorithm design. We provide a gentle introduction to a broader framework for gradient-based algorithms in machine learning, beginning with saddle points and monotone games, and proceeding to general variational inequalities. While we provide convergence proofs for several of the algorithms that we present, our main focus is that of providing motivation and intuition.

Related papers

Model-Agnostic Zeroth-Order Policy Optimization for Meta-Learning of Ergodic Linear Quadratic Regulators [13.343937277604892]
We study the problem of using meta-learning to deal with uncertainty and heterogeneity in ergodic linear quadratic regulators. We propose an algorithm that omits the estimation of policy Hessian, which applies to tasks of learning a set of heterogeneous but similar linear dynamic systems. We provide a convergence result for the exact gradient descent process by analyzing the boundedness and smoothness of the gradient for the meta-objective.
arXiv Detail & Related papers (2024-05-27T17:26:36Z)
Neural Algorithmic Reasoning Without Intermediate Supervision [21.852775399735005]
We focus on learning neural algorithmic reasoning only from the input-output pairs without appealing to the intermediate supervision. We build a self-supervised objective that can regularise intermediate computations of the model without access to the algorithm trajectory. We demonstrate that our approach is competitive to its trajectory-supervised counterpart on tasks from the CLRSic Algorithmic Reasoning Benchmark.
arXiv Detail & Related papers (2023-06-23T09:57:44Z)
Understanding A Class of Decentralized and Federated Optimization Algorithms: A Multi-Rate Feedback Control Perspective [41.05789078207364]
We provide a fresh perspective to understand, analyze, and design distributed optimization algorithms. We show that a wide class of distributed algorithms, including popular decentralized/federated schemes, can be viewed as discretizing a certain continuous-time feedback control system.
arXiv Detail & Related papers (2022-04-27T01:53:57Z)
Learning for Spatial Branching: An Algorithm Selection Approach [0.0]
We develop a learning framework for branching and show its efficacy in the context of non-linear optimization problems. The proposed learning is performed offline, based on instance-specific features and with no computational overhead when solving new instances. Experiments on different benchmark instances from the show literature that the learning-based branching rule significantly outperforms the standard rules.
arXiv Detail & Related papers (2022-04-22T17:23:43Z)
Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances. Online descent (OGD) is a popular approach to handle streaming data in pairwise learning. In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z)
Physical Gradients for Deep Learning [101.36788327318669]
We find that state-of-the-art training techniques are not well-suited to many problems that involve physical processes. We propose a novel hybrid training approach that combines higher-order optimization methods with machine learning techniques.
arXiv Detail & Related papers (2021-09-30T12:14:31Z)
SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients [99.13839450032408]
It is desired to design a universal framework for adaptive algorithms to solve general problems. In particular, our novel framework provides adaptive methods under non convergence support for setting.
arXiv Detail & Related papers (2021-06-15T15:16:28Z)
Investigating Bi-Level Optimization for Learning and Vision from a Unified Perspective: A Survey and Beyond [114.39616146985001]
In machine learning and computer vision fields, despite the different motivations and mechanisms, a lot of complex problems contain a series of closely related subproblms. In this paper, we first uniformly express these complex learning and vision problems from the perspective of Bi-Level Optimization (BLO) Then we construct a value-function-based single-level reformulation and establish a unified algorithmic framework to understand and formulate mainstream gradient-based BLO methodologies.
arXiv Detail & Related papers (2021-01-27T16:20:23Z)
Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem. We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent. Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.