Related papers: Conditionally Risk-Averse Contextual Bandits

Conditionally Risk-Averse Contextual Bandits

URL: http://arxiv.org/abs/2210.13573v2
Date: Sat, 8 Jul 2023 15:33:53 GMT
Title: Conditionally Risk-Averse Contextual Bandits
Authors: M\'onika Farsang and Paul Mineiro and Wangda Zhang
Abstract summary: Contextual bandits with average-case statistical guarantees are inadequate in risk-averse situations. We present the first risk-averse contextual bandit algorithm with an online regret guarantee. We conduct experiments from diverse scenarios where worst-case outcomes should be avoided.
Score: 8.894935073145252
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Contextual bandits with average-case statistical guarantees are inadequate in risk-averse situations because they might trade off degraded worst-case behaviour for better average performance. Designing a risk-averse contextual bandit is challenging because exploration is necessary but risk-aversion is sensitive to the entire distribution of rewards; nonetheless we exhibit the first risk-averse contextual bandit algorithm with an online regret guarantee. We conduct experiments from diverse scenarios where worst-case outcomes should be avoided, from dynamic pricing, inventory management, and self-tuning software; including a production exascale data processing system.

Related papers

Beyond Safe Answers: A Benchmark for Evaluating True Risk Awareness in Large Reasoning Models [29.569220030102986]
We introduce textbfBeyond Safe Answers (BSA) bench, a novel benchmark comprising 2,000 challenging instances organized into three distinct SSA scenario types.<n> Evaluations of 19 state-of-the-art LRMs demonstrate the difficulty of this benchmark, with top-performing models achieving only 38.0% accuracy in correctly identifying risk rationales.<n>Our work provides a comprehensive assessment tool for evaluating and improving safety reasoning fidelity in LRMs, advancing the development of genuinely risk-aware and reliably safe AI systems.
arXiv Detail & Related papers (2025-05-26T08:49:19Z)
Data-driven decision-making under uncertainty with entropic risk measure [5.407319151576265]
The entropic risk measure is widely used in high-stakes decision making to account for tail risks associated with an uncertain loss. To debias the empirical entropic risk estimator, we propose a strongly consistent bootstrapping procedure. We show that cross validation methods can result in significantly higher out-of-sample risk for the insurer if the bias in validation performance is not corrected for.
arXiv Detail & Related papers (2024-09-30T04:02:52Z)
Data-Adaptive Tradeoffs among Multiple Risks in Distribution-Free Prediction [55.77015419028725]
We develop methods that permit valid control of risk when threshold and tradeoff parameters are chosen adaptively. Our methodology supports monotone and nearly-monotone risks, but otherwise makes no distributional assumptions.
arXiv Detail & Related papers (2024-03-28T17:28:06Z)
SafeAR: Safe Algorithmic Recourse by Risk-Aware Policies [2.291948092032746]
We present a method to compute recourse policies that consider variability in cost. We show how existing recourse desiderata can fail to capture the risk of higher costs.
arXiv Detail & Related papers (2023-08-23T18:12:11Z)
A Survey of Risk-Aware Multi-Armed Bandits [84.67376599822569]
We review various risk measures of interest, and comment on their properties. We consider algorithms for the regret minimization setting, where the exploration-exploitation trade-off manifests. We conclude by commenting on persisting challenges and fertile areas for future research.
arXiv Detail & Related papers (2022-05-12T02:20:34Z)
Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns. We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it. We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z)
Risk-Constrained Thompson Sampling for CVaR Bandits [82.47796318548306]
We consider a popular risk measure in quantitative finance known as the Conditional Value at Risk (CVaR) We explore the performance of a Thompson Sampling-based algorithm CVaR-TS under this risk measure.
arXiv Detail & Related papers (2020-11-16T15:53:22Z)
Near-Optimal MNL Bandits Under Risk Criteria [13.251377915797674]
We study MNL bandits, which is a variant of the traditional multi-armed bandit problem, under risk criteria. We design algorithms for a broad class of risk criteria, including but not limited to the well-known conditional value-at-risk, Sharpe ratio and entropy risk, and prove that they suffer a near-optimal regret.
arXiv Detail & Related papers (2020-09-26T03:24:40Z)
Learning Bounds for Risk-sensitive Learning [86.50262971918276]
In risk-sensitive learning, one aims to find a hypothesis that minimizes a risk-averse (or risk-seeking) measure of loss. We study the generalization properties of risk-sensitive learning schemes whose optimand is described via optimized certainty equivalents.
arXiv Detail & Related papers (2020-06-15T05:25:02Z)
Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean. We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.