Related papers: Risk-Averse Best Arm Set Identification with Fixed Budget and Fixed Confidence

Risk-Averse Best Arm Set Identification with Fixed Budget and Fixed Confidence

URL: http://arxiv.org/abs/2506.22253v1
Date: Fri, 27 Jun 2025 14:21:03 GMT
Title: Risk-Averse Best Arm Set Identification with Fixed Budget and Fixed Confidence
Authors: Shunta Nonaga, Koji Tabata, Yuta Mizuno, Tamiki Komatsuzaki,
Abstract summary: We introduce a novel problem setting in bandit optimization that addresses maximizing expected reward and minimizing associated uncertainty.<n>We propose a unified meta-budgetalgorithmic framework capable of operating under both fixed-confidence and fixed-optimal regimes.<n>Our approach outperforms existing methods in terms of both accuracy and sample efficiency.
Score: 0.562479170374811
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Decision making under uncertain environments in the maximization of expected reward while minimizing its risk is one of the ubiquitous problems in many subjects. Here, we introduce a novel problem setting in stochastic bandit optimization that jointly addresses two critical aspects of decision-making: maximizing expected reward and minimizing associated uncertainty, quantified via the mean-variance(MV) criterion. Unlike traditional bandit formulations that focus solely on expected returns, our objective is to efficiently and accurately identify the Pareto-optimal set of arms that strikes the best trade-off between expected performance and risk. We propose a unified meta-algorithmic framework capable of operating under both fixed-confidence and fixed-budget regimes, achieved through adaptive design of confidence intervals tailored to each scenario using the same sample exploration strategy. We provide theoretical guarantees on the correctness of the returned solutions in both settings. To complement this theoretical analysis, we conduct extensive empirical evaluations across synthetic benchmarks, demonstrating that our approach outperforms existing methods in terms of both accuracy and sample efficiency, highlighting its broad applicability to risk-aware decision-making tasks in uncertain environments.

Related papers

Rectified Robust Policy Optimization for Model-Uncertain Constrained Reinforcement Learning without Strong Duality [53.525547349715595]
We propose a novel primal-only algorithm called Rectified Robust Policy Optimization (RRPO)<n>RRPO operates directly on the primal problem without relying on dual formulations.<n>We show convergence to an approximately optimal feasible policy with complexity matching the best-known lower bound.
arXiv Detail & Related papers (2025-08-24T16:59:38Z)
Robust Satisficing Gaussian Process Bandits Under Adversarial Attacks [7.701333337093469]
We consider a robust satisficing objective, where the goal is to consistently achieve a predefined performance threshold $tau$, even under adversarial conditions.<n>We propose two novel algorithms based on distinct formulations of robust satisficing, and show that they are instances of a general robust satisficing framework.<n>Specifically, we derive two regret bounds: one that is sublinear over time, assuming certain conditions on the adversary and the satisficing threshold $tau$, and another that scales with the perturbation magnitude but requires no assumptions on the adversary.
arXiv Detail & Related papers (2025-06-02T13:04:18Z)
Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time [52.230936493691985]
We propose SITAlign, an inference framework that addresses the multifaceted nature of alignment by maximizing a primary objective while satisfying threshold-based constraints on secondary criteria.<n>We provide theoretical insights by deriving sub-optimality bounds of our satisficing based inference alignment approach.
arXiv Detail & Related papers (2025-05-29T17:56:05Z)
An Optimisation Framework for Unsupervised Environment Design [88.29733214939544]
unsupervised environment design (UED) aims to maximise agent's general robustness.<n>We provide a provably convergent algorithm in the zero-sum setting.<n>We empirically verify the efficacy of our method.
arXiv Detail & Related papers (2025-05-27T03:07:26Z)
SConU: Selective Conformal Uncertainty in Large Language Models [59.25881667640868]
We propose a novel approach termed Selective Conformal Uncertainty (SConU)<n>We develop two conformal p-values that are instrumental in determining whether a given sample deviates from the uncertainty distribution of the calibration set at a specific manageable risk level.<n>Our approach not only facilitates rigorous management of miscoverage rates across both single-domain and interdisciplinary contexts, but also enhances the efficiency of predictions.
arXiv Detail & Related papers (2025-04-19T03:01:45Z)
End-to-end Conditional Robust Optimization [6.363653898208231]
Conditional Robust Optimization (CRO) combines uncertainty quantification with robust optimization to promote safety and reliability in high stake applications. We propose a novel end-to-end approach to train a CRO model in a way that accounts for both the empirical risk of the prescribed decisions and the quality of conditional coverage of the contextual uncertainty set that supports them. We show that the proposed training algorithms produce decisions that outperform the traditional estimate then optimize approaches.
arXiv Detail & Related papers (2024-03-07T17:16:59Z)
Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values. We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z)
Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences. Our method is especially suitable for problems with well-specified likelihoods. We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z)
Federated Distributionally Robust Optimization with Non-Convex Objectives: Algorithm and Analysis [24.64654924173679]
Asynchronous distributed algorithm named Asynchronous Single-looP alternatIve gRadient projEction is proposed. New uncertainty set, i.e., constrained D-norm uncertainty set, is developed to leverage the prior distribution and flexibly control the degree of robustness. empirical studies on real-world datasets demonstrate that the proposed method can not only achieve fast convergence, but also remain robust against data as well as malicious attacks.
arXiv Detail & Related papers (2023-07-25T01:56:57Z)
A Distribution Optimization Framework for Confidence Bounds of Risk Measures [23.46659319363579]
We present a distribution optimization framework that significantly improves confidence bounds for various risk measures compared to previous methods. Our framework encompasses popular risk measures such as the entropic risk measure, conditional value at risk (CVaR), spectral risk measure, distortion risk measure, equivalent certainty, and rank-dependent expected utility.
arXiv Detail & Related papers (2023-06-12T12:13:06Z)
Distributed Distributionally Robust Optimization with Non-Convex Objectives [24.64654924173679]
Asynchronous distributed algorithm named Asynchronous Single-looP alternatIve gRadient projEction is proposed. New uncertainty set, i.e., constrained D-norm uncertainty set, is developed to leverage the prior distribution and flexibly control the degree of robustness. empirical studies on real-world datasets demonstrate that the proposed method can not only achieve fast convergence, but also remain robust against data as well as malicious attacks.
arXiv Detail & Related papers (2022-10-14T07:39:13Z)
Learning Bounds for Risk-sensitive Learning [86.50262971918276]
In risk-sensitive learning, one aims to find a hypothesis that minimizes a risk-averse (or risk-seeking) measure of loss. We study the generalization properties of risk-sensitive learning schemes whose optimand is described via optimized certainty equivalents.
arXiv Detail & Related papers (2020-06-15T05:25:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.