Related papers: Finite-Sample Bounds for Adaptive Inverse Reinforcement Learning using Passive Langevin Dynamics

Finite-Sample Bounds for Adaptive Inverse Reinforcement Learning using Passive Langevin Dynamics

URL: http://arxiv.org/abs/2304.09123v3
Date: Wed, 15 Jan 2025 02:19:34 GMT
Title: Finite-Sample Bounds for Adaptive Inverse Reinforcement Learning using Passive Langevin Dynamics
Authors: Luke Snow, Vikram Krishnamurthy,
Abstract summary: This paper provides a finite-sample analysis of a passive gradient Langevin dynamics (PSGLD) algorithm.<n> Adaptive IRL aims to estimate the cost function of a forward learner performing a gradient algorithm.
Score: 13.440621354486906
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper provides a finite-sample analysis of a passive stochastic gradient Langevin dynamics (PSGLD) algorithm. This algorithm is designed to achieve adaptive inverse reinforcement learning (IRL). Adaptive IRL aims to estimate the cost function of a forward learner performing a stochastic gradient algorithm (e.g., policy gradient reinforcement learning) by observing their estimates in real-time. The PSGLD algorithm is considered passive because it incorporates noisy gradients provided by an external stochastic gradient algorithm (forward learner), of which it has no control. The PSGLD algorithm acts as a randomized sampler to achieve adaptive IRL by reconstructing the forward learner's cost function nonparametrically from the stationary measure of a Langevin diffusion. This paper analyzes the non-asymptotic (finite-sample) performance; we provide explicit bounds on the 2-Wasserstein distance between PSGLD algorithm sample measure and the stationary measure encoding the cost function, and provide guarantees for a kernel density estimation scheme which reconstructs the cost function from empirical samples. Our analysis uses tools from the study of Markov diffusion operators. The derived bounds have both practical and theoretical significance. They provide finite-time guarantees for an adaptive IRL mechanism, and substantially generalize the analytical framework of a line of research in passive stochastic gradient algorithms.

Related papers

A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks. We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks. Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z)
Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training. Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z)
Langevin dynamics based algorithm e-TH$\varepsilon$O POULA for stochastic optimization problems with discontinuous stochastic gradient [6.563379950720334]
We introduce a new Langevin dynamics based algorithm, called e-TH$varepsilon$O POULA, to solve optimization problems with discontinuous gradients. Three key applications in finance and insurance are provided, namely, multi-period portfolio optimization, transfer learning in multi-period portfolio optimization, and insurance claim prediction.
arXiv Detail & Related papers (2022-10-24T13:10:06Z)
Rigorous dynamical mean field theory for stochastic gradient descent methods [17.90683687731009]
We prove closed-form equations for the exact high-dimensionals of a family of first order gradient-based methods. This includes widely used algorithms such as gradient descent (SGD) or Nesterov acceleration.
arXiv Detail & Related papers (2022-10-12T21:10:55Z)
Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems. In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z)
Random-reshuffled SARAH does not need a full gradient computations [61.85897464405715]
The StochAstic Recursive grAdientritHm (SARAH) algorithm is a variance reduced variant of the Gradient Descent (SGD) algorithm. In this paper, we remove the necessity of a full gradient. The aggregated gradients serve as an estimate of a full gradient in the SARAH algorithm.
arXiv Detail & Related papers (2021-11-26T06:00:44Z)
Differentiable Annealed Importance Sampling and the Perils of Gradient Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective. We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z)
Non-asymptotic estimates for TUSLA algorithm for non-convex learning with applications to neural networks with ReLU activation function [3.5044892799305956]
We provide a non-asymptotic analysis for the tamed un-adjusted Langevin algorithm (TUSLA) introduced in Lovas et al. In particular, we establish non-asymptotic error bounds for the TUSLA algorithm in Wassersteinstein-1-2. We show that the TUSLA algorithm converges rapidly to the optimal solution.
arXiv Detail & Related papers (2021-07-19T07:13:02Z)
Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks. We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution. We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z)
Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information. We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions [17.14287157979558]
We propose an adaptively weighted gradient Langevin dynamics (SGLD) for learning in big data statistics. The proposed algorithm is tested on benchmark datasets including CIFAR100.
arXiv Detail & Related papers (2020-10-19T19:20:47Z)
Langevin Dynamics for Adaptive Inverse Reinforcement Learning of Stochastic Gradient Algorithms [21.796874356469644]
Inverse reinforcement learning (IRL) aims to estimate the reward function of optimizing agents by observing their response. We present a generalized Langevin dynamics to estimate the reward function $R(theta)$. The proposed IRL algorithms use kernel-based passive learning schemes and generate samples from the distribution proportional to $exp(R(theta)$.
arXiv Detail & Related papers (2020-06-20T23:12:11Z)
Non-asymptotic bounds for stochastic optimization with biased noisy gradient oracles [8.655294504286635]
We introduce biased gradient oracles to capture a setting where the function measurements have an estimation error. Our proposed oracles are in practical contexts, for instance, risk measure estimation from a batch of independent and identically distributed simulation.
arXiv Detail & Related papers (2020-02-26T12:53:04Z)
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks. In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems. Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.