Problem Dependent View on Structured Thresholding Bandit Problems
- URL: http://arxiv.org/abs/2106.10166v1
- Date: Fri, 18 Jun 2021 15:01:01 GMT
- Title: Problem Dependent View on Structured Thresholding Bandit Problems
- Authors: James Cheshire, Pierre M\'enard, Alexandra Carpentier
- Abstract summary: We investigate the problem dependent regime in the Thresholding Bandit problem (TBP)
The objective of the learner is to output, at the end of a sequential game, the set of arms whose means are above a given threshold.
We provide upper and lower bounds for the probability of error in both the concave and monotone settings.
- Score: 73.70176003598449
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We investigate the problem dependent regime in the stochastic Thresholding
Bandit problem (TBP) under several shape constraints. In the TBP, the objective
of the learner is to output, at the end of a sequential game, the set of arms
whose means are above a given threshold. The vanilla, unstructured, case is
already well studied in the literature. Taking $K$ as the number of arms, we
consider the case where (i) the sequence of arm's means $(\mu_k)_{k=1}^K$ is
monotonically increasing (MTBP) and (ii) the case where $(\mu_k)_{k=1}^K$ is
concave (CTBP). We consider both cases in the problem dependent regime and
study the probability of error - i.e. the probability to mis-classify at least
one arm. In the fixed budget setting, we provide upper and lower bounds for the
probability of error in both the concave and monotone settings, as well as
associated algorithms. In both settings the bounds match in the problem
dependent regime up to universal constants in the exponential.
Related papers
- A General Framework for Clustering and Distribution Matching with Bandit Feedback [81.50716021326194]
We develop a general framework for clustering and distribution matching problems with bandit feedback.
We derive a non-asymptotic lower bound on the average number of arm pulls for any online algorithm with an error probability not exceeding $delta$.
arXiv Detail & Related papers (2024-09-08T12:19:12Z) - Pure Exploration for Constrained Best Mixed Arm Identification with a Fixed Budget [6.22018632187078]
We introduce the constrained best mixed arm identification (CBMAI) problem with a fixed budget.
The goal is to find the best mixed arm that maximizes the expected reward subject to constraints on the expected costs with a given learning budget $N$.
We provide a theoretical upper bound on the mis-identification (of the the support of the best mixed arm) probability and show that it decays exponentially in the budget $N$.
arXiv Detail & Related papers (2024-05-23T22:35:11Z) - Fixed-Budget Differentially Private Best Arm Identification [62.36929749450298]
We study best arm identification (BAI) in linear bandits in the fixed-budget regime under differential privacy constraints.
We derive a minimax lower bound on the error probability, and demonstrate that the lower and the upper bounds decay exponentially in $T$.
arXiv Detail & Related papers (2024-01-17T09:23:25Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Multi-Fidelity Multi-Armed Bandits Revisited [46.19926456682379]
We study the multi-fidelity multi-armed bandit (MF-MAB), an extension of the canonical multi-armed bandit (MAB) problem.
MF-MAB allows each arm to be pulled with different costs (fidelities) and observation accuracy.
arXiv Detail & Related papers (2023-06-13T13:19:20Z) - Contextual Combinatorial Bandits with Probabilistically Triggered Arms [45.305256056479045]
We study contextual bandits with probabilistically triggered arms (C$2$MAB-T) under a variety of smoothness conditions.
Under the triggering modulated (TPM) condition, we devise the C$2$-UC-T algorithm and derive a regret bound $tildeO(dsqrtT)$.
arXiv Detail & Related papers (2023-03-30T02:51:00Z) - Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with
Probabilistically Triggered Arms or Independent Arms [53.89752069984382]
We study the semi-bandits (CMAB) and focus on reducing the dependency of the batch-size $K$ in the regret bound.
First, for the setting of CMAB with probabilistically triggered arms (CMAB-T), we propose a BCUCB-T algorithm with variance-aware confidence intervals.
Second, for the setting of non-triggering CMAB with independent arms, we propose a SESCB algorithm which leverages on the non-triggering version of the TPVM condition.
arXiv Detail & Related papers (2022-08-31T13:09:39Z) - The Influence of Shape Constraints on the Thresholding Bandit Problem [74.73898216415049]
We investigate the Thresholding Bandit problem (TBP) under several shape constraints.
In the TBP problem the aim is to output, at the end of the sequential game, the set of arms whose means are above a given threshold.
We prove that the minimax rates for the regret are (i) $sqrtlog(K)K/T$ for TBP, (ii) $sqrtlog(K)/T$ for UTBP, (iii) $sqrtK/T$ for CTBP and (iv)
arXiv Detail & Related papers (2020-06-17T17:12:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.