Fisher-Rao Gradient Flows of Linear Programs and State-Action Natural Policy Gradients
- URL: http://arxiv.org/abs/2403.19448v2
- Date: Mon, 03 Feb 2025 21:40:30 GMT
- Title: Fisher-Rao Gradient Flows of Linear Programs and State-Action Natural Policy Gradients
- Authors: Johannes Müller, Semih Çaycı, Guido Montúfar,
- Abstract summary: We study another natural gradient method based on the Fisher information matrix of the state-action distributions.<n>We show sublinear convergence for Fisher-Rao gradient flows and natural gradient flows up to an approximation error.
- Score: 15.218434620361387
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Kakade's natural policy gradient method has been studied extensively in recent years, showing linear convergence with and without regularization. We study another natural gradient method based on the Fisher information matrix of the state-action distributions which has received little attention from the theoretical side. Here, the state-action distributions follow the Fisher-Rao gradient flow inside the state-action polytope with respect to a linear potential. Therefore, we study Fisher-Rao gradient flows of linear programs more generally and show linear convergence with a rate that depends on the geometry of the linear program. Equivalently, this yields an estimate on the error induced by entropic regularization of the linear program which improves existing results. We extend these results and show sublinear convergence for perturbed Fisher-Rao gradient flows and natural gradient flows up to an approximation error. In particular, these general results cover the case of state-action natural policy gradients.
Related papers
- Comparing regularisation paths of (conjugate) gradient estimators in ridge regression [0.0]
We consider gradient descent, gradient flow and conjugate gradients as iterative algorithms for minimizing a penalized ridge criterion in linear regression.
In particular, the oracle conjugate gradient iterate shares the optimality properties of the gradient flow and ridge regression oracles up to a constant factor.
arXiv Detail & Related papers (2025-03-07T16:14:06Z) - Gradient Equilibrium in Online Learning: Theory and Applications [56.02856551198923]
gradient equilibrium is achieved by standard online learning methods.
gradient equilibrium translates into an interpretable and meaningful property in online prediction problems.
We show that gradient equilibrium framework can be used to develop a debiasing scheme for black-box predictions.
arXiv Detail & Related papers (2025-01-14T18:59:09Z) - Kernel Approximation of Fisher-Rao Gradient Flows [52.154685604660465]
We present a rigorous investigation of Fisher-Rao and Wasserstein type gradient flows concerning their gradient structures, flow equations, and their kernel approximations.
Specifically, we focus on the Fisher-Rao geometry and its various kernel-based approximations, developing a principled theoretical framework.
arXiv Detail & Related papers (2024-10-27T22:52:08Z) - Corridor Geometry in Gradient-Based Optimization [11.177186975058047]
We show that corridors are exactly the regions where gradient descent and the gradient flow follow the same trajectory.
Using the loss linear decrease on corridors, we devise a learning rate adaptation scheme for gradient descent.
arXiv Detail & Related papers (2024-02-13T21:54:15Z) - Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - Optimization Landscape of Policy Gradient Methods for Discrete-time
Static Output Feedback [22.21598324895312]
This paper analyzes the optimization landscape inherent to policy gradient methods when applied to static output feedback control.
We derive novel findings regarding convergence (and nearly dimension-free rate) to stationary points for three policy gradient methods.
We provide proof that the vanilla policy gradient method exhibits linear convergence towards local minima when near such minima.
arXiv Detail & Related papers (2023-10-29T14:25:57Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - A Fisher-Rao gradient flow for entropy-regularised Markov decision
processes in Polish spaces [10.777806006475297]
We study the global convergence of a Fisher-Rao policy gradient flow for infinite-horizon entropy-regularised Markov decision processes with Polish state and action space.
We establish the global well-posedness of the gradient flow and demonstrate its exponential convergence to the optimal policy.
arXiv Detail & Related papers (2023-10-04T16:41:36Z) - Towards Training Without Depth Limits: Batch Normalization Without
Gradient Explosion [83.90492831583997]
We show that a batch-normalized network can keep the optimal signal propagation properties, but avoid exploding gradients in depth.
We use a Multi-Layer Perceptron (MLP) with linear activations and batch-normalization that provably has bounded depth.
We also design an activation shaping scheme that empirically achieves the same properties for certain non-linear activations.
arXiv Detail & Related papers (2023-10-03T12:35:02Z) - Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data [63.34506218832164]
In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with ReLU activations.
For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that leakyally, gradient flow produces a neural network with rank at most two.
For gradient descent, provided the random variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training.
arXiv Detail & Related papers (2022-10-13T15:09:54Z) - Linear Convergence of Natural Policy Gradient Methods with Log-Linear
Policies [115.86431674214282]
We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class.
We show that both methods attain linear convergence rates and $mathcalO (1/epsilon2)$ sample complexities using a simple, non-adaptive geometrically increasing step size.
arXiv Detail & Related papers (2022-10-04T06:17:52Z) - Linear Convergence for Natural Policy Gradient with Log-linear Policy
Parametrization [18.072051868187934]
We analyze the convergence rate of the unregularized natural policy algorithm with log-linear policy parametrizations.
We show that the algorithm enjoys the same linear guarantees as in the deterministic case up to an error term.
arXiv Detail & Related papers (2022-09-30T11:17:44Z) - Stability vs Implicit Bias of Gradient Methods on Separable Data and
Beyond [33.593203156666746]
We focus on the generalization properties of unregularized gradient-based learning procedures applied to separable linear classification.
We give an additional unified explanation for this generalization, that we refer to as realizability and self-boundedness.
In some of these cases, our bounds significantly improve upon the existing generalization error bounds in the literature.
arXiv Detail & Related papers (2022-02-27T19:56:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.