The Price of Incentivizing Exploration: A Characterization via Thompson
Sampling and Sample Complexity
- URL: http://arxiv.org/abs/2002.00558v6
- Date: Sun, 12 Jun 2022 19:54:20 GMT
- Title: The Price of Incentivizing Exploration: A Characterization via Thompson
Sampling and Sample Complexity
- Authors: Mark Sellke, Aleksandrs Slivkins
- Abstract summary: We consider incentivized exploration: a version of multi-armed bandits where the choice of arms is controlled by self-interested agents.
We focus on the price of incentives: the loss in performance, broadly construed, incurred for the sake of incentive-compatibility.
- Score: 83.81297078039836
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider incentivized exploration: a version of multi-armed bandits where
the choice of arms is controlled by self-interested agents, and the algorithm
can only issue recommendations. The algorithm controls the flow of information,
and the information asymmetry can incentivize the agents to explore. Prior work
achieves optimal regret rates up to multiplicative factors that become
arbitrarily large depending on the Bayesian priors, and scale exponentially in
the number of arms. A more basic problem of sampling each arm once runs into
similar factors.
We focus on the price of incentives: the loss in performance, broadly
construed, incurred for the sake of incentive-compatibility. We prove that
Thompson Sampling, a standard bandit algorithm, is incentive-compatible if
initialized with sufficiently many data points. The performance loss due to
incentives is therefore limited to the initial rounds when these data points
are collected. The problem is largely reduced to that of sample complexity: how
many rounds are needed? We address this question, providing matching upper and
lower bounds and instantiating them in various corollaries. Typically, the
optimal sample complexity is polynomial in the number of arms and exponential
in the "strength of beliefs".
Related papers
- The Batch Complexity of Bandit Pure Exploration [10.036727981085223]
In a pure exploration problem in multi-armed bandits, an algorithm iteratively samples arms and should stop as early as possible and return the correct answer to a query about the arms distributions.
We are interested in batched methods, which change their sampling behaviour only a few times, between batches of observations.
We give an instance-dependent lower bound on the number of batches used by any sample efficient algorithm for any pure exploration task.
arXiv Detail & Related papers (2025-02-03T15:03:45Z) - Optimal Multi-Objective Best Arm Identification with Fixed Confidence [62.36929749450298]
We consider a multi-armed bandit setting in which each arm yields an $M$-dimensional vector reward upon selection.
The end goal is to identify the best arm of em every objective in the shortest (expected) time subject to an upper bound on the probability of error.
We propose an algorithm that uses the novel idea of em surrogate proportions to sample the arms at each time step, eliminating the need to solve the max-min optimisation problem at each step.
arXiv Detail & Related papers (2025-01-23T12:28:09Z) - Best Arm Identification in Batched Multi-armed Bandit Problems [3.908867017036043]
Recently multi-armed bandit problem arises in many real-life scenarios where arms must be sampled in batches.
We introduce a general linear programming framework that can incorporate objectives of different theoretical settings in best arm identification.
We demonstrate by numerical studies that the algorithm also has good performance compared to certain UCB-type or Thompson sampling methods.
arXiv Detail & Related papers (2023-12-21T14:16:38Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Thompson Exploration with Best Challenger Rule in Best Arm
Identification [66.33448474838342]
We study the fixed-confidence best arm identification problem in the bandit framework.
We propose a novel policy that combines Thompson sampling with a computationally efficient approach known as the best challenger rule.
arXiv Detail & Related papers (2023-10-01T01:37:02Z) - Incentivizing Combinatorial Bandit Exploration [87.08827496301839]
Consider a bandit algorithm that recommends actions to self-interested users in a recommendation system.
Users are free to choose other actions and need to be incentivized to follow the algorithm's recommendations.
While the users prefer to exploit, the algorithm can incentivize them to explore by leveraging the information collected from the previous users.
arXiv Detail & Related papers (2022-06-01T13:46:25Z) - Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback.
It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines.
We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z) - Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous
Feedback [32.62857394584907]
We study the multi-armed bandit (MAB) problem with composite and anonymous feedback.
We propose adaptive algorithms for both the adversarial and non- adversarial cases.
arXiv Detail & Related papers (2020-12-13T12:25:41Z) - Quantile Multi-Armed Bandits: Optimal Best-Arm Identification and a
Differentially Private Scheme [16.1694012177079]
We study the best-arm identification problem in multi-armed bandits with, potentially private rewards.
The goal is to identify the arm with the highest quantile at a fixed, prescribed level.
We show that our algorithm is $delta$-PAC and we characterize its sample complexity.
arXiv Detail & Related papers (2020-06-11T20:23:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.