A One-Size-Fits-All Solution to Conservative Bandit Problems
- URL: http://arxiv.org/abs/2012.07341v3
- Date: Wed, 16 Dec 2020 08:15:50 GMT
- Title: A One-Size-Fits-All Solution to Conservative Bandit Problems
- Authors: Yihan Du, Siwei Wang, Longbo Huang
- Abstract summary: We study a family of conservative bandit problems (CBPs) with sample-path reward constraints.
We propose a One-Size-Fits-All solution to CBPs and present its applications to three encompassed problems.
- Score: 32.907883501286
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we study a family of conservative bandit problems (CBPs) with
sample-path reward constraints, i.e., the learner's reward performance must be
at least as well as a given baseline at any time. We propose a
One-Size-Fits-All solution to CBPs and present its applications to three
encompassed problems, i.e. conservative multi-armed bandits (CMAB),
conservative linear bandits (CLB) and conservative contextual combinatorial
bandits (CCCB). Different from previous works which consider high probability
constraints on the expected reward, we focus on a sample-path constraint on the
actually received reward, and achieve better theoretical guarantees
($T$-independent additive regrets instead of $T$-dependent) and empirical
performance. Furthermore, we extend the results and consider a novel
conservative mean-variance bandit problem (MV-CBP), which measures the learning
performance with both the expected reward and variability. For this extended
problem, we provide a novel algorithm with $O(1/T)$ normalized additive regrets
($T$-independent in the cumulative form) and validate this result through
empirical evaluation.
Related papers
- Fixed-Budget Differentially Private Best Arm Identification [62.36929749450298]
We study best arm identification (BAI) in linear bandits in the fixed-budget regime under differential privacy constraints.
We derive a minimax lower bound on the error probability, and demonstrate that the lower and the upper bounds decay exponentially in $T$.
arXiv Detail & Related papers (2024-01-17T09:23:25Z) - $\alpha$-Fair Contextual Bandits [10.74025233418392]
Contextual bandit algorithms are at the core of many applications, including recommender systems, clinical trials, and optimal portfolio selection.
One of the most popular problems studied in the contextual bandit literature is to maximize the sum of the rewards in each round.
In this paper, we consider the $alpha$-Fairtextual Con Bandits problem, where the objective is to maximize the global $alpha$-fair utility function.
arXiv Detail & Related papers (2023-10-22T03:42:59Z) - Thompson Exploration with Best Challenger Rule in Best Arm
Identification [66.33448474838342]
We study the fixed-confidence best arm identification problem in the bandit framework.
We propose a novel policy that combines Thompson sampling with a computationally efficient approach known as the best challenger rule.
arXiv Detail & Related papers (2023-10-01T01:37:02Z) - On Penalization in Stochastic Multi-armed Bandits [22.04356596828437]
We study an important variant of the multi-armed bandit (MAB) problem, which takes penalization into consideration.
We propose a hard-threshold UCB-like algorithm, which enjoys many merits including fairness, nearly optimal regret, better tradeoff between reward and fairness.
arXiv Detail & Related papers (2022-11-15T17:13:09Z) - Contextual bandits with concave rewards, and an application to fair
ranking [108.48223948875685]
We present the first algorithm with provably vanishing regret for Contextual Bandits with Concave Rewards (CBCR)
We derive a novel reduction from the CBCR regret to the regret of a scalar-reward problem.
Motivated by fairness in recommendation, we describe a special case of CBCR with rankings and fairness-aware objectives.
arXiv Detail & Related papers (2022-10-18T16:11:55Z) - Bandits Corrupted by Nature: Lower Bounds on Regret and Robust
Optimistic Algorithm [14.214707836697823]
We study the corrupted bandit problem, i.e. a multi-armed bandit problem with $k$ unknown reward distributions.
We propose a novel UCB-type algorithm for corrupted bandits, namely HubUCB, that builds on Huber's estimator for robust mean estimation.
We experimentally illustrate the efficiency of HubUCB and SeqHubUCB in solving corrupted bandits for different reward distributions and different levels of corruptions.
arXiv Detail & Related papers (2022-03-07T07:44:05Z) - Confidence-Budget Matching for Sequential Budgeted Learning [69.77435313099366]
We formalize decision-making problems with querying budget.
We consider multi-armed bandits, linear bandits, and reinforcement learning problems.
We show that CBM based algorithms perform well in the presence of adversity.
arXiv Detail & Related papers (2021-02-05T19:56:31Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z) - Improved Algorithms for Conservative Exploration in Bandits [113.55554483194832]
We study the conservative learning problem in the contextual linear bandit setting and introduce a novel algorithm, the Conservative Constrained LinUCB (CLUCB2)
We derive regret bounds for CLUCB2 that match existing results and empirically show that it outperforms state-of-the-art conservative bandit algorithms in a number of synthetic and real-world problems.
arXiv Detail & Related papers (2020-02-08T19:35:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.