Provably Efficient Risk-Sensitive Reinforcement Learning: Iterated CVaR
and Worst Path
- URL: http://arxiv.org/abs/2206.02678v2
- Date: Thu, 11 May 2023 06:19:14 GMT
- Title: Provably Efficient Risk-Sensitive Reinforcement Learning: Iterated CVaR
and Worst Path
- Authors: Yihan Du, Siwei Wang, Longbo Huang
- Abstract summary: We study a novel episodic risk-sensitive Reinforcement Learning (RL) problem, named Iterated CVaR RL, which aims to maximize the tail of the reward-to-go at each step.
This formulation is applicable to real-world tasks that demand strong risk avoidance throughout the decision process.
- Score: 40.4378338001229
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we study a novel episodic risk-sensitive Reinforcement
Learning (RL) problem, named Iterated CVaR RL, which aims to maximize the tail
of the reward-to-go at each step, and focuses on tightly controlling the risk
of getting into catastrophic situations at each stage. This formulation is
applicable to real-world tasks that demand strong risk avoidance throughout the
decision process, such as autonomous driving, clinical treatment planning and
robotics. We investigate two performance metrics under Iterated CVaR RL, i.e.,
Regret Minimization and Best Policy Identification. For both metrics, we design
efficient algorithms ICVaR-RM and ICVaR-BPI, respectively, and provide nearly
matching upper and lower bounds with respect to the number of episodes $K$. We
also investigate an interesting limiting case of Iterated CVaR RL, called Worst
Path RL, where the objective becomes to maximize the minimum possible
cumulative reward. For Worst Path RL, we propose an efficient algorithm with
constant upper and lower bounds. Finally, our techniques for bounding the
change of CVaR due to the value function shift and decomposing the regret via a
distorted visitation distribution are novel, and can find applications in other
risk-sensitive RL problems.
Related papers
- The Fallacy of Minimizing Cumulative Regret in the Sequential Task Setting [11.834850394160608]
In real-world RL applications, human-in-the-loop decisions between tasks often results in non-stationarity.
Our results show that task non-stationarity leads to a more restrictive trade-off between cumulative regret (CR) and simple regret (SR)
These findings are practically significant, indicating that increased exploration is necessary in non-stationary environments to accommodate task changes.
arXiv Detail & Related papers (2024-03-16T15:29:22Z) - Efficient Off-Policy Safe Reinforcement Learning Using Trust Region
Conditional Value at Risk [16.176812250762666]
An on-policy safe RL method, called TRC, deals with a CVaR-constrained RL problem using a trust region method.
To achieve outstanding performance in complex environments and satisfy safety constraints quickly, RL methods are required to be sample efficient.
We propose novel surrogate functions, in which the effect of the distributional shift can be reduced, and introduce an adaptive trust-region constraint to ensure a policy not to deviate far from replay buffers.
arXiv Detail & Related papers (2023-12-01T04:29:19Z) - Provably Efficient CVaR RL in Low-rank MDPs [58.58570425202862]
We study risk-sensitive Reinforcement Learning (RL)
We propose a novel Upper Confidence Bound (UCB) bonus-driven algorithm to balance interplay between exploration, exploitation, and representation learning in CVaR RL.
We prove that our algorithm achieves a sample complexity of $epsilon$-optimal CVaR, where $H$ is the length of each episode, $A$ is the capacity of action space, and $d$ is the dimension of representations.
arXiv Detail & Related papers (2023-11-20T17:44:40Z) - Provably Efficient Iterated CVaR Reinforcement Learning with Function
Approximation and Human Feedback [57.6775169085215]
Risk-sensitive reinforcement learning aims to optimize policies that balance the expected reward and risk.
We present a novel framework that employs an Iterated Conditional Value-at-Risk (CVaR) objective under both linear and general function approximations.
We propose provably sample-efficient algorithms for this Iterated CVaR RL and provide rigorous theoretical analysis.
arXiv Detail & Related papers (2023-07-06T08:14:54Z) - Is Risk-Sensitive Reinforcement Learning Properly Resolved? [32.42976780682353]
We propose a novel algorithm, namely Trajectory Q-Learning (TQL), for RSRL problems with provable convergence to the optimal policy.
Based on our new learning architecture, we are free to introduce a general and practical implementation for different risk measures to learn disparate risk-sensitive policies.
arXiv Detail & Related papers (2023-07-02T11:47:21Z) - Train Hard, Fight Easy: Robust Meta Reinforcement Learning [78.16589993684698]
A major challenge of reinforcement learning (RL) in real-world applications is the variation between environments, tasks or clients.
Standard MRL methods optimize the average return over tasks, but often suffer from poor results in tasks of high risk or difficulty.
In this work, we define a robust MRL objective with a controlled level.
The data inefficiency is addressed via the novel Robust Meta RL algorithm (RoML)
arXiv Detail & Related papers (2023-01-26T14:54:39Z) - Combining Pessimism with Optimism for Robust and Efficient Model-Based
Deep Reinforcement Learning [56.17667147101263]
In real-world tasks, reinforcement learning agents encounter situations that are not present during training time.
To ensure reliable performance, the RL agents need to exhibit robustness against worst-case situations.
We propose the Robust Hallucinated Upper-Confidence RL (RH-UCRL) algorithm to provably solve this problem.
arXiv Detail & Related papers (2021-03-18T16:50:17Z) - Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR)
We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.