Related papers: EVaR-Optimal Arm Identification in Bandits

Related papers

Active Bipartite Ranking with Smooth Posterior Distributions [1.9838140219494644]
Bipartite ranking is a statistical learning problem involved in many applications and widely studied in the passive context.<n>We propose a novel algorithm, referred to as smooth-rank and designed for the continuous setting, which aims to minimise the distance between the ROC curve of the estimated ranking rule and the optimal one w.r.t. the $sup$ norm.<n>We provide a problem dependent upper bound on the expected sampling time of smooth-rank and establish a problem dependent lower bound on the expected sampling time of any PAC$(,)$ algorithm.
arXiv Detail & Related papers (2026-02-27T18:32:08Z)
Almost Asymptotically Optimal Active Clustering Through Pairwise Observations [59.20614082241528]
We propose a new analysis framework for clustering $M$ items into an unknown number of $K$ distinct groups using noisy and actively collected responses.<n>We establish a fundamental lower bound on the expected number of queries needed to achieve a desired confidence in the accuracy of the clustering.<n>We develop a computationally feasible variant of the Generalized Likelihood Ratio statistic and show that its performance gap to the lower bound can be accurately empirically estimated.
arXiv Detail & Related papers (2026-02-05T14:16:47Z)
Constrained Pareto Set Identification with Bandit Feedback [10.967572582187014]
Given a $K$-armed bandit with unknown means, the goal is to identify the set of arms whose mean is not uniformly worse than that of another arm.<n>Our focus lies in fixed-confidence identification, for which we introduce an algorithm that significantly outperforms racing-like algorithms.
arXiv Detail & Related papers (2025-06-09T18:29:28Z)
Asymptotically Optimal Linear Best Feasible Arm Identification with Fixed Budget [55.938644481736446]
We introduce a novel algorithm for best feasible arm identification that guarantees an exponential decay in the error probability.<n>We validate our algorithm through comprehensive empirical evaluations across various problem instances with different levels of complexity.
arXiv Detail & Related papers (2025-06-03T02:56:26Z)
Enhancing Distributional Robustness in Principal Component Analysis by Wasserstein Distances [7.695578200868269]
We consider the distributionally robust optimization (DRO) model of principal component analysis (PCA) to account for uncertainty in the underlying probability distribution.<n>The resulting formulation leads to a nonsmooth constrained min-max optimization problem, where the ambiguity set captures the distributional uncertainty by the type-$2$ Wasserstein distance.<n>This explicit characterization equivalently reformulates the original DRO model into a minimization problem on the Stiefel manifold with intricate nonsmooth terms.
arXiv Detail & Related papers (2025-03-04T11:00:08Z)
Towards a Sharp Analysis of Offline Policy Learning for $f$-Divergence-Regularized Contextual Bandits [49.96531901205305]
We analyze $f$-divergence-regularized offline policy learning.<n>For reverse Kullback-Leibler (KL) divergence, we give the first $tildeO(epsilon-1)$ sample complexity under single-policy concentrability.<n>We extend our analysis to dueling bandits, and we believe these results take a significant step toward a comprehensive understanding of $f$-divergence-regularized policy learning.
arXiv Detail & Related papers (2025-02-09T22:14:45Z)
Optimal Rates for Robust Stochastic Convex Optimization [12.620782629498812]
We develop novel algorithms that achieve minimax-optimal excess risk (up to logarithmic factors) under the $epsilon$-contamination model.<n>Our algorithms do not require stringent assumptions, including Lipschitz continuity and smoothness of individual sample functions.<n>We complement our algorithmic developments with a tight information-theoretic lower bound for robust SCO.
arXiv Detail & Related papers (2024-12-15T00:52:08Z)
Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms. In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z)
Fully Stochastic Trust-Region Sequential Quadratic Programming for Equality-Constrained Optimization Problems [62.83783246648714]
We propose a sequential quadratic programming algorithm (TR-StoSQP) to solve nonlinear optimization problems with objectives and deterministic equality constraints. The algorithm adaptively selects the trust-region radius and, compared to the existing line-search StoSQP schemes, allows us to utilize indefinite Hessian matrices.
arXiv Detail & Related papers (2022-11-29T05:52:17Z)
Beyond No Regret: Instance-Dependent PAC Reinforcement Learning [29.552894877883883]
We show that there exists a tradeoff between achieving low regret and identifying an $epsilon$-optimal policy at the instance-optimal rate. We propose and analyze a novel, planning-based algorithm which attains this sample complexity. We show that our algorithm is nearly minimax optimal, and on several examples that our instance-dependent sample complexity offers significant improvements over worst-case bounds.
arXiv Detail & Related papers (2021-08-05T16:34:17Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
The Simulator: Understanding Adaptive Sampling in the Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator. We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors. Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.