Related papers: Risk-aware Stochastic Shortest Path

Related papers

Planning and Learning in Average Risk-aware MDPs [4.696083734269232]
We extend risk-neutral algorithms to accommodate the more general class of dynamic risk measures. Both the RVI and Q-learning algorithms are proven to converge to optimality. Our approach enables the identification of policies that are finely tuned to the intricate risk-awareness of the agent that they serve.
arXiv Detail & Related papers (2025-03-22T03:18:09Z)
Uncertainty-Aware Decoding with Minimum Bayes Risk [70.6645260214115]
We show how Minimum Bayes Risk decoding, which selects model generations according to an expected risk, can be generalized into a principled uncertainty-aware decoding method. We show that this modified expected risk is useful for both choosing outputs and deciding when to abstain from generation and can provide improvements without incurring overhead.
arXiv Detail & Related papers (2025-03-07T10:55:12Z)
Robust Stochastic Shortest-Path Planning via Risk-Sensitive Incremental Sampling [9.651071174735804]
We propose a risk-aware Rapidly-Exploring Random Trees (RRT*) planning algorithm for Shortest-Path (SSP) problems. Our motivation rests on the step-wise coherence of the Conditional Value-at-Risk (CVaR) risk measure and the optimal substructure of the SSP problem. Our simulation results show that incorporating risk into the tree growth process yields paths with lengths that are significantly less sensitive to variations in the noise parameter.
arXiv Detail & Related papers (2024-08-16T11:21:52Z)
Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values. We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z)
Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments [48.96971760679639]
We study variance-dependent regret bounds for Markov decision processes (MDPs) We propose two new environment norms to characterize the fine-grained variance properties of the environment. For model-based methods, we design a variant of the MVP algorithm. In particular, this bound is simultaneously minimax optimal for both and deterministic MDPs.
arXiv Detail & Related papers (2023-01-31T06:54:06Z)
Regret Bounds for Markov Decision Processes with Recursive Optimized Certainty Equivalents [3.8980564330208662]
We propose a new episodic risk-sensitive reinforcement learning formulation. We design an efficient learning algorithm for this problem based on value iteration and upper confidence bound. Our bounds show that the regret rate achieved by our proposed algorithm has optimal dependence on the number of episodes and the number of actions.
arXiv Detail & Related papers (2023-01-30T01:22:31Z)
Risk-Averse MDPs under Reward Ambiguity [9.929659318167731]
We propose a distributionally robust return-risk model for Markov decision processes (MDPs) under risk and reward ambiguity. A scalable first-order algorithm is designed to solve large-scale problems.
arXiv Detail & Related papers (2023-01-03T11:06:30Z)
Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns. We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it. We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z)
Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes. Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z)
Risk-Averse Stochastic Shortest Path Planning [25.987787625028204]
We show that optimal, stationary, Markovian policies exist and can be found via a special Bellman's equation. A rover navigation MDP is used to illustrate the proposed methodology with conditional-value-at-risk (CVaR) and entropic-value-at-risk (EVaR) coherent risk measures.
arXiv Detail & Related papers (2021-03-26T20:49:14Z)
Risk-Averse Bayes-Adaptive Reinforcement Learning [3.5289688061934963]
We pose the problem of optimising the conditional value at risk (CVaR) of the total return in Bayes-adaptive Markov decision processes (MDPs) We show that a policy optimising CVaR in this setting is risk-averse to both the parametric uncertainty due to the prior distribution over MDPs, and the internal uncertainty due to the inherentity of MDPs. Our experiments demonstrate that our approach significantly outperforms baseline approaches for this problem.
arXiv Detail & Related papers (2021-02-10T22:34:33Z)
Large-Scale Methods for Distributionally Robust Optimization [53.98643772533416]
We prove that our algorithms require a number of evaluations gradient independent of training set size and number of parameters. Experiments on MNIST and ImageNet confirm the theoretical scaling of our algorithms, which are 9--36 times more efficient than full-batch methods.
arXiv Detail & Related papers (2020-10-12T17:41:44Z)
Thompson Sampling Algorithms for Mean-Variance Bandits [97.43678751629189]
We develop Thompson Sampling-style algorithms for mean-variance MAB. We also provide comprehensive regret analyses for Gaussian and Bernoulli bandits. Our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.
arXiv Detail & Related papers (2020-02-01T15:33:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.