GradientDICE: Rethinking Generalized Offline Estimation of Stationary
Values
- URL: http://arxiv.org/abs/2001.11113v7
- Date: Thu, 26 Nov 2020 17:49:45 GMT
- Title: GradientDICE: Rethinking Generalized Offline Estimation of Stationary
Values
- Authors: Shangtong Zhang, Bo Liu, Shimon Whiteson
- Abstract summary: We present GradientDICE for estimating the density ratio between the state distribution of the target policy and the sampling distribution.
GenDICE is the state-of-the-art for estimating such density ratios.
- Score: 75.17074235764757
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present GradientDICE for estimating the density ratio between the state
distribution of the target policy and the sampling distribution in off-policy
reinforcement learning. GradientDICE fixes several problems of GenDICE (Zhang
et al., 2020), the state-of-the-art for estimating such density ratios. Namely,
the optimization problem in GenDICE is not a convex-concave saddle-point
problem once nonlinearity in optimization variable parameterization is
introduced to ensure positivity, so any primal-dual algorithm is not guaranteed
to converge or find the desired solution. However, such nonlinearity is
essential to ensure the consistency of GenDICE even with a tabular
representation. This is a fundamental contradiction, resulting from GenDICE's
original formulation of the optimization problem. In GradientDICE, we optimize
a different objective from GenDICE by using the Perron-Frobenius theorem and
eliminating GenDICE's use of divergence. Consequently, nonlinearity in
parameterization is not necessary for GradientDICE, which is provably
convergent under linear function approximation.
Related papers
- Gaussian Approximation and Multiplier Bootstrap for Stochastic Gradient Descent [14.19520637866741]
We establish non-asymptotic convergence rates in the central limit theorem for Polyak-Ruppert-averaged iterates of gradient descent.
We prove the non-asymptotic validity of the multiplier bootstrap for constructing the confidence sets for an optimization problem.
arXiv Detail & Related papers (2025-02-10T17:49:05Z) - Gradient-Based Non-Linear Inverse Learning [2.6149030745627644]
We study statistical inverse learning in the context of nonlinear inverse problems under random design.
We employ gradient descent (GD) and descent gradient (SGD) with mini-batching, both using constant step sizes.
Our analysis derives convergence rates for both algorithms under classical a priori assumptions on the smoothness of the target function.
arXiv Detail & Related papers (2024-12-21T22:38:17Z) - Convergence Analysis of Adaptive Gradient Methods under Refined Smoothness and Noise Assumptions [18.47705532817026]
We show that AdaGrad outperforms SGD by a factor of $d$ under certain conditions.
Motivated by this, we introduce assumptions on the smoothness structure of the objective and the gradient variance.
arXiv Detail & Related papers (2024-06-07T02:55:57Z) - Sobolev Space Regularised Pre Density Models [51.558848491038916]
We propose a new approach to non-parametric density estimation that is based on regularizing a Sobolev norm of the density.
This method is statistically consistent, and makes the inductive validation model clear and consistent.
arXiv Detail & Related papers (2023-07-25T18:47:53Z) - Curvature-Independent Last-Iterate Convergence for Games on Riemannian
Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate.
To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z) - Convex and Non-convex Optimization Under Generalized Smoothness [69.69521650503431]
An analysis of convex and non- optimization methods often requires the Lipsitzness gradient, which limits the analysis by this trajectorys.
Recent work generalizes the gradient setting via the non-uniform smoothness condition.
arXiv Detail & Related papers (2023-06-02T04:21:59Z) - The Power of Adaptivity in SGD: Self-Tuning Step Sizes with Unbounded
Gradients and Affine Variance [46.15915820243487]
We show that AdaGrad-Norm exhibits an order optimal convergence of $mathcalOleft.
We show that AdaGrad-Norm exhibits an order optimal convergence of $mathcalOleft.
arXiv Detail & Related papers (2022-02-11T17:37:54Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - The Strength of Nesterov's Extrapolation in the Individual Convergence
of Nonsmooth Optimization [0.0]
We prove that Nesterov's extrapolation has the strength to make the individual convergence of gradient descent methods optimal for nonsmooth problems.
We give an extension of the derived algorithms to solve regularized learning tasks with nonsmooth losses in settings.
Our method is applicable as an efficient tool for solving large-scale $l$1-regularized hinge-loss learning problems.
arXiv Detail & Related papers (2020-06-08T03:35:41Z) - On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization [80.03647903934723]
We prove adaptive gradient methods in expectation of gradient convergence methods.
Our analyses shed light on better adaptive gradient methods in optimizing non understanding gradient bounds.
arXiv Detail & Related papers (2018-08-16T20:25:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.