Efficient Risk-sensitive Planning via Entropic Risk Measures
- URL: http://arxiv.org/abs/2502.20423v1
- Date: Thu, 27 Feb 2025 09:56:51 GMT
- Title: Efficient Risk-sensitive Planning via Entropic Risk Measures
- Authors: Alexandre Marthe, Samuel Bounan, Aurélien Garivier, Claire Vernade,
- Abstract summary: We show that only Entropic Risk Measures (EntRM) can be efficiently optimized through dynamic programming.<n>We prove that this optimality front can be computed effectively thanks to a novel structural analysis and smoothness properties of entropic risks.
- Score: 51.42922439693624
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Risk-sensitive planning aims to identify policies maximizing some tail-focused metrics in Markov Decision Processes (MDPs). Such an optimization task can be very costly for the most widely used and interpretable metrics such as threshold probabilities or (Conditional) Values at Risk. Indeed, previous work showed that only Entropic Risk Measures (EntRM) can be efficiently optimized through dynamic programming, leaving a hard-to-interpret parameter to choose. We show that the computation of the full set of optimal policies for EntRM across parameter values leads to tight approximations for the metrics of interest. We prove that this optimality front can be computed effectively thanks to a novel structural analysis and smoothness properties of entropic risks. Empirical results demonstrate that our approach achieves strong performance in a variety of decision-making scenarios.
Related papers
- Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning [19.292214425524303]
We study risk-sensitive reinforcement learning (RL), a crucial field due to its ability to enhance decision-making in scenarios where it is essential to manage uncertainty and minimize potential adverse outcomes.
Our work focuses on applying the entropic risk measure to RL problems.
We center on the linear Markov Decision Process (MDP) setting, a well-regarded theoretical framework that has yet to be examined from a risk-sensitive standpoint.
arXiv Detail & Related papers (2024-07-10T13:09:52Z) - End-to-End Learning for Fair Multiobjective Optimization Under
Uncertainty [55.04219793298687]
The Predict-Then-Forecast (PtO) paradigm in machine learning aims to maximize downstream decision quality.
This paper extends the PtO methodology to optimization problems with nondifferentiable Ordered Weighted Averaging (OWA) objectives.
It shows how optimization of OWA functions can be effectively integrated with parametric prediction for fair and robust optimization under uncertainty.
arXiv Detail & Related papers (2024-02-12T16:33:35Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Risk-sensitive Markov Decision Process and Learning under General Utility Functions [3.069335774032178]
Reinforcement Learning (RL) has gained substantial attention across diverse application domains and theoretical investigations.<n>We consider a scenario where the decision-maker seeks to optimize a general utility function of the cumulative reward in the framework of a decision process (MDP)<n>We propose a modified value iteration algorithm that employs an epsilon-covering over the space of cumulative reward.<n>In the absence of a simulator, our algorithm, designed with an upper-confidence-bound exploration approach, identifies a near-optimal policy.
arXiv Detail & Related papers (2023-11-22T18:50:06Z) - Risk-sensitive Actor-free Policy via Convex Optimization [0.0]
Traditional reinforcement learning methods optimize agents safety, potentially resulting in unintended consequences.
We propose an optimal actor-sensitive policy based on a conditional-risk-free network.
arXiv Detail & Related papers (2023-06-30T21:20:04Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - Lexicographic Optimisation of Conditional Value at Risk and Expected
Value for Risk-Averse Planning in MDPs [4.87191262649216]
Planning in Markov decision processes (MDPs) typically optimises the expected cost.
An alternative approach is to find a policy which optimises a risk-averse objective such as conditional value at risk (CVaR)
We formulate the lexicographic optimisation problem of minimising the expected cost subject to the constraint that the CVaR of the total cost is optimal.
arXiv Detail & Related papers (2021-10-25T09:16:50Z) - RAPTOR: End-to-end Risk-Aware MDP Planning and Policy Learning by
Backpropagation [12.600828753197204]
We introduce Risk-Aware Planning using PyTorch (RAP), a novel framework for risk-sensitive planning through end-to-end optimization of the entropic utility objective.
We evaluate and compare these two forms of RAPTOR on three highly do-mains, including nonlinear navigation, HVAC control, and linear reservoir control.
arXiv Detail & Related papers (2021-06-14T09:27:19Z) - Policy Gradient Bayesian Robust Optimization for Imitation Learning [49.881386773269746]
We derive a novel policy gradient-style robust optimization approach, PG-BROIL, to balance expected performance and risk.
Results suggest PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse.
arXiv Detail & Related papers (2021-06-11T16:49:15Z) - Mixed Strategies for Robust Optimization of Unknown Objectives [93.8672371143881]
We consider robust optimization problems, where the goal is to optimize an unknown objective function against the worst-case realization of an uncertain parameter.
We design a novel sample-efficient algorithm GP-MRO, which sequentially learns about the unknown objective from noisy point evaluations.
GP-MRO seeks to discover a robust and randomized mixed strategy, that maximizes the worst-case expected objective value.
arXiv Detail & Related papers (2020-02-28T09:28:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.