Policy Gradient Methods for Distortion Risk Measures
- URL: http://arxiv.org/abs/2107.04422v7
- Date: Mon, 5 Feb 2024 03:45:35 GMT
- Title: Policy Gradient Methods for Distortion Risk Measures
- Authors: Nithia Vijayan and Prashanth L.A
- Abstract summary: We propose policy gradient algorithms which learn risk-sensitive policies in a reinforcement learning framework.
We derive a variant of the policy gradient theorem that caters to the DRM objective, and integrate it with a likelihood ratio-based gradient estimation scheme.
- Score: 9.554545881355377
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose policy gradient algorithms which learn risk-sensitive policies in
a reinforcement learning (RL) framework. Our proposed algorithms maximize the
distortion risk measure (DRM) of the cumulative reward in an episodic Markov
decision process in on-policy and off-policy RL settings, respectively. We
derive a variant of the policy gradient theorem that caters to the DRM
objective, and integrate it with a likelihood ratio-based gradient estimation
scheme. We derive non-asymptotic bounds that establish the convergence of our
proposed algorithms to an approximate stationary point of the DRM objective.
Related papers
- Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs [82.34567890576423]
We develop a deterministic policy gradient primal-dual method to find an optimal deterministic policy with non-asymptotic convergence.
We prove that the primal-dual iterates of D-PGPD converge at a sub-linear rate to an optimal regularized primal-dual pair.
To the best of our knowledge, this appears to be the first work that proposes a deterministic policy search method for continuous-space constrained MDPs.
arXiv Detail & Related papers (2024-08-19T14:11:04Z) - Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions.
We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z) - Policy Gradient Methods for Risk-Sensitive Distributional Reinforcement Learning with Provable Convergence [15.720824593964027]
Risk-sensitive reinforcement learning (RL) is crucial for maintaining reliable performance in high-stakes applications.
This paper introduces a policy gradient method for risk-sensitive DRL with general coherent risk measures.
We also design a categorical distributional policy gradient algorithm (CDPG) based on categorical distributional policy evaluation and trajectory gradient estimation.
arXiv Detail & Related papers (2024-05-23T16:16:58Z) - Regret Analysis of Policy Gradient Algorithm for Infinite Horizon
Average Reward Markov Decision Processes [38.879933964474326]
We consider an infinite horizon average reward Markov Decision Process (MDP)
We propose a policy gradient-based algorithm and show its global convergence property.
We prove that the proposed algorithm has $tildemathcalO(T3/4)$ regret.
arXiv Detail & Related papers (2023-09-05T03:22:46Z) - High-probability sample complexities for policy evaluation with linear function approximation [88.87036653258977]
We investigate the sample complexities required to guarantee a predefined estimation error of the best linear coefficients for two widely-used policy evaluation algorithms.
We establish the first sample complexity bound with high-probability convergence guarantee that attains the optimal dependence on the tolerance level.
arXiv Detail & Related papers (2023-05-30T12:58:39Z) - A Policy Gradient Method for Confounded POMDPs [7.75007282943125]
We propose a policy gradient method for confounded partially observable Markov decision processes (POMDPs) with continuous state and observation spaces in the offline setting.
We first establish a novel identification result to non-parametrically estimate any history-dependent policy gradient under POMDPs using the offline data.
arXiv Detail & Related papers (2023-05-26T16:48:05Z) - Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective
Reinforcement Learning [17.916366827429034]
We study policy optimization for Markov decision processes (MDPs) with multiple reward value functions.
We propose an Anchor-changing Regularized Natural Policy Gradient framework, which can incorporate ideas from well-performing first-order methods.
arXiv Detail & Related papers (2022-06-10T21:09:44Z) - A policy gradient approach for optimization of smooth risk measures [8.087699764574788]
We consider episodic Markov decision processes, and model the risk using the broad class of smooth risk measures of the cumulative discounted reward.
We propose two template policy gradient algorithms that optimize a smooth risk measure in on-policy and off-policy RL settings.
arXiv Detail & Related papers (2022-02-22T17:26:28Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z) - Variance-Reduced Off-Policy Memory-Efficient Policy Search [61.23789485979057]
Off-policy policy optimization is a challenging problem in reinforcement learning.
Off-policy algorithms are memory-efficient and capable of learning from off-policy samples.
arXiv Detail & Related papers (2020-09-14T16:22:46Z) - Statistically Efficient Off-Policy Policy Gradients [80.42316902296832]
We consider the statistically efficient estimation of policy gradients from off-policy data.
We propose a meta-algorithm that achieves the lower bound without any parametric assumptions.
We establish guarantees on the rate at which we approach a stationary point when we take steps in the direction of our new estimated policy gradient.
arXiv Detail & Related papers (2020-02-10T18:41:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.