Is All Learning (Natural) Gradient Descent?
- URL: http://arxiv.org/abs/2409.16422v1
- Date: Tue, 24 Sep 2024 19:41:08 GMT
- Title: Is All Learning (Natural) Gradient Descent?
- Authors: Lucas Shoji, Kenta Suzuki, Leo Kozachkov,
- Abstract summary: We show that a class of effective learning rules can be as natural gradient descent with respect to a suitably defined loss function and metric.
We also demonstrate that these metrics have a canonical form and identify several optimal ones, including the metric that achieves the minimum possible condition number.
- Score: 1.3654846342364308
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper shows that a wide class of effective learning rules -- those that improve a scalar performance measure over a given time window -- can be rewritten as natural gradient descent with respect to a suitably defined loss function and metric. Specifically, we show that parameter updates within this class of learning rules can be expressed as the product of a symmetric positive definite matrix (i.e., a metric) and the negative gradient of a loss function. We also demonstrate that these metrics have a canonical form and identify several optimal ones, including the metric that achieves the minimum possible condition number. The proofs of the main results are straightforward, relying only on elementary linear algebra and calculus, and are applicable to continuous-time, discrete-time, stochastic, and higher-order learning rules, as well as loss functions that explicitly depend on time.
Related papers
- Statistical Inference for Temporal Difference Learning with Linear Function Approximation [62.69448336714418]
Temporal Difference (TD) learning, arguably the most widely used for policy evaluation, serves as a natural framework for this purpose.
In this paper, we study the consistency properties of TD learning with Polyak-Ruppert averaging and linear function approximation, and obtain three significant improvements over existing results.
arXiv Detail & Related papers (2024-10-21T15:34:44Z) - Rethinking The Uniformity Metric in Self-Supervised Learning [20.040558579232105]
Uniformity plays an important role in evaluating learned representations, providing insights into self-supervised learning.
We find that the uniformity metric proposed by citetWang 2020UnderstandingCR fails to satisfy the majority of these properties.
To overcome these limitations, we introduce a new uniformity metric based on the Wasserstein distance.
arXiv Detail & Related papers (2024-03-01T16:22:05Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Efficient and Near-Optimal Smoothed Online Learning for Generalized
Linear Functions [28.30744223973527]
We give a computationally efficient algorithm that is the first to enjoy the statistically optimal log(T/sigma) regret for realizable K-wise linear classification.
We develop a novel characterization of the geometry of the disagreement region induced by generalized linear classifiers.
arXiv Detail & Related papers (2022-05-25T21:31:36Z) - Near-optimal Offline Reinforcement Learning with Linear Representation:
Leveraging Variance Information with Pessimism [65.46524775457928]
offline reinforcement learning seeks to utilize offline/historical data to optimize sequential decision-making strategies.
We study the statistical limits of offline reinforcement learning with linear model representations.
arXiv Detail & Related papers (2022-03-11T09:00:12Z) - Continuous-Time Meta-Learning with Forward Mode Differentiation [65.26189016950343]
We introduce Continuous Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field.
Treating the learning process as an ODE offers the notable advantage that the length of the trajectory is now continuous.
We show empirically its efficiency in terms of runtime and memory usage, and we illustrate its effectiveness on a range of few-shot image classification problems.
arXiv Detail & Related papers (2022-03-02T22:35:58Z) - Comparing Classes of Estimators: When does Gradient Descent Beat Ridge
Regression in Linear Models? [46.01087792062936]
We compare classes of estimators via the relative performance of the emphbest method in the class
This allows us to rigorously quantify the tuning sensitivity of learning algorithms.
arXiv Detail & Related papers (2021-08-26T16:01:37Z) - Learning Linearized Assignment Flows for Image Labeling [70.540936204654]
We introduce a novel algorithm for estimating optimal parameters of linearized assignment flows for image labeling.
We show how to efficiently evaluate this formula using a Krylov subspace and a low-rank approximation.
arXiv Detail & Related papers (2021-08-02T13:38:09Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Handling the Positive-Definite Constraint in the Bayesian Learning Rule [33.87717973872535]
The Bayesian learning rule is a natural-gradient variational inference method.
When variational parameters lie in an open constraint set, the rule may not satisfy the constraint and requires line-searches which could slow down the algorithm.
Our work makes it easier to apply the rule in the presence of positive-definite constraints in parameter spaces.
arXiv Detail & Related papers (2020-02-24T03:29:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.