Non-stationary Risk-sensitive Reinforcement Learning: Near-optimal
Dynamic Regret, Adaptive Detection, and Separation Design
- URL: http://arxiv.org/abs/2211.10815v1
- Date: Sat, 19 Nov 2022 22:40:09 GMT
- Title: Non-stationary Risk-sensitive Reinforcement Learning: Near-optimal
Dynamic Regret, Adaptive Detection, and Separation Design
- Authors: Yuhao Ding, Ming Jin, Javad Lavaei
- Abstract summary: We study risk-sensitive reinforcement learning (RL) based on an entropic risk measure in episodic non-stationary Markov decision processes (MDPs)
We propose two restart-based algorithms, namely Restart-RSMB and Restart-RSQ, and establish their dynamic regrets.
This work offers the first non-asymptotic theoretical analyses for the non-stationary risk-sensitive RL in the literature.
- Score: 9.554944575754638
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study risk-sensitive reinforcement learning (RL) based on an entropic risk
measure in episodic non-stationary Markov decision processes (MDPs). Both the
reward functions and the state transition kernels are unknown and allowed to
vary arbitrarily over time with a budget on their cumulative variations. When
this variation budget is known a prior, we propose two restart-based
algorithms, namely Restart-RSMB and Restart-RSQ, and establish their dynamic
regrets. Based on these results, we further present a meta-algorithm that does
not require any prior knowledge of the variation budget and can adaptively
detect the non-stationarity on the exponential value functions. A dynamic
regret lower bound is then established for non-stationary risk-sensitive RL to
certify the near-optimality of the proposed algorithms. Our results also show
that the risk control and the handling of the non-stationarity can be
separately designed in the algorithm if the variation budget is known a prior,
while the non-stationary detection mechanism in the adaptive algorithm depends
on the risk parameter. This work offers the first non-asymptotic theoretical
analyses for the non-stationary risk-sensitive RL in the literature.
Related papers
- Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning [19.292214425524303]
We study risk-sensitive reinforcement learning (RL), a crucial field due to its ability to enhance decision-making in scenarios where it is essential to manage uncertainty and minimize potential adverse outcomes.
Our work focuses on applying the entropic risk measure to RL problems.
We center on the linear Markov Decision Process (MDP) setting, a well-regarded theoretical framework that has yet to be examined from a risk-sensitive standpoint.
arXiv Detail & Related papers (2024-07-10T13:09:52Z) - Robust Risk-Sensitive Reinforcement Learning with Conditional Value-at-Risk [23.63388546004777]
We analyze the robustness of CVaR-based risk-sensitive RL under Robust Markov Decision Processes.
Motivated by the existence of decision-dependent uncertainty in real-world problems, we study problems with state-action-dependent ambiguity sets.
arXiv Detail & Related papers (2024-05-02T20:28:49Z) - Continuous-time Risk-sensitive Reinforcement Learning via Quadratic Variation Penalty [5.710971447109951]
This paper studies continuous-time risk-sensitive reinforcement learning (RL)
I highlight that the conventional policy gradient representation is inadequate for risk-sensitive problems due to the nonlinear nature of quadratic variation.
I prove the convergence of the proposed algorithm for Merton's investment problem and quantify the impact of temperature parameter on the behavior of the learning procedure.
arXiv Detail & Related papers (2024-04-19T03:05:41Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Risk-sensitive Markov Decision Process and Learning under General
Utility Functions [3.6260136172126667]
Reinforcement Learning (RL) has gained substantial attention across diverse application domains and theoretical investigations.
We propose a modified value algorithm that employs an epsilon-covering over the space of cumulative reward.
In the absence of a simulator, our algorithm, designed with an upper-confidence-bound exploration approach, identifies a near-optimal policy.
arXiv Detail & Related papers (2023-11-22T18:50:06Z) - Provably Efficient Iterated CVaR Reinforcement Learning with Function
Approximation and Human Feedback [57.6775169085215]
Risk-sensitive reinforcement learning aims to optimize policies that balance the expected reward and risk.
We present a novel framework that employs an Iterated Conditional Value-at-Risk (CVaR) objective under both linear and general function approximations.
We propose provably sample-efficient algorithms for this Iterated CVaR RL and provide rigorous theoretical analysis.
arXiv Detail & Related papers (2023-07-06T08:14:54Z) - The Unreasonable Effectiveness of Deep Evidential Regression [72.30888739450343]
A new approach with uncertainty-aware regression-based neural networks (NNs) shows promise over traditional deterministic methods and typical Bayesian NNs.
We detail the theoretical shortcomings and analyze the performance on synthetic and real-world data sets, showing that Deep Evidential Regression is a quantification rather than an exact uncertainty.
arXiv Detail & Related papers (2022-05-20T10:10:32Z) - Multivariate Deep Evidential Regression [77.34726150561087]
A new approach with uncertainty-aware neural networks shows promise over traditional deterministic methods.
We discuss three issues with a proposed solution to extract aleatoric and epistemic uncertainties from regression-based neural networks.
arXiv Detail & Related papers (2021-04-13T12:20:18Z) - A Regret Minimization Approach to Iterative Learning Control [61.37088759497583]
We propose a new performance metric, planning regret, which replaces the standard uncertainty assumptions with worst case regret.
We provide theoretical and empirical evidence that the proposed algorithm outperforms existing methods on several benchmarks.
arXiv Detail & Related papers (2021-02-26T13:48:49Z) - The Risks of Invariant Risk Minimization [52.7137956951533]
Invariant Risk Minimization is an objective based on the idea for learning deep, invariant features of data.
We present the first analysis of classification under the IRM objective--as well as these recently proposed alternatives--under a fairly natural and general model.
We show that IRM can fail catastrophically unless the test data are sufficiently similar to the training distribution--this is precisely the issue that it was intended to solve.
arXiv Detail & Related papers (2020-10-12T14:54:32Z) - Reinforcement Learning for Non-Stationary Markov Decision Processes: The
Blessing of (More) Optimism [25.20231604057821]
We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under drifting non-stationarity.
We first develop the Sliding Window Upper-Confidence bound for Reinforcement Learning with Confidence Widening (SWUCRL2-CW) algorithm.
We propose the Bandit-over-Reinforcement Learning (BORL) algorithm to adaptively tune the SWUCRL2-CW algorithm to achieve the same dynamic regret bound.
arXiv Detail & Related papers (2020-06-24T15:40:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.