Incentivizing Combinatorial Bandit Exploration
- URL: http://arxiv.org/abs/2206.00494v1
- Date: Wed, 1 Jun 2022 13:46:25 GMT
- Title: Incentivizing Combinatorial Bandit Exploration
- Authors: Xinyan Hu, Dung Daniel Ngo, Aleksandrs Slivkins, Zhiwei Steven Wu
- Abstract summary: Consider a bandit algorithm that recommends actions to self-interested users in a recommendation system.
Users are free to choose other actions and need to be incentivized to follow the algorithm's recommendations.
While the users prefer to exploit, the algorithm can incentivize them to explore by leveraging the information collected from the previous users.
- Score: 87.08827496301839
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Consider a bandit algorithm that recommends actions to self-interested users
in a recommendation system. The users are free to choose other actions and need
to be incentivized to follow the algorithm's recommendations. While the users
prefer to exploit, the algorithm can incentivize them to explore by leveraging
the information collected from the previous users. All published work on this
problem, known as incentivized exploration, focuses on small, unstructured
action sets and mainly targets the case when the users' beliefs are independent
across actions. However, realistic exploration problems often feature large,
structured action sets and highly correlated beliefs. We focus on a
paradigmatic exploration problem with structure: combinatorial semi-bandits. We
prove that Thompson Sampling, when applied to combinatorial semi-bandits, is
incentive-compatible when initialized with a sufficient number of samples of
each arm (where this number is determined in advance by the Bayesian prior).
Moreover, we design incentive-compatible algorithms for collecting the initial
samples.
Related papers
- Incentivizing Exploration with Linear Contexts and Combinatorial Actions [9.15749739027059]
In incentivized bandit exploration, arm choices are viewed as recommendations and are required to be Bayesian incentive compatible.
Recent work has shown under certain independence assumptions that after collecting enough initial samples, the popular Thompson sampling algorithm becomes incentive compatible.
We give an analog of this result for linear bandits, where the independence of the prior is replaced by a natural convexity condition.
arXiv Detail & Related papers (2023-06-03T03:30:42Z) - An Empirical Evaluation of Federated Contextual Bandit Algorithms [27.275089644378376]
Federated learning can be done using implicit signals generated as users interact with applications of interest.
We develop variants of prominent contextual bandit algorithms from the centralized seting for the federated setting.
Our experiments reveal the surprising effectiveness of the simple and commonly used softmax in balancing the well-know exploration-exploitation tradeoff.
arXiv Detail & Related papers (2023-03-17T19:22:30Z) - Bandit Social Learning: Exploration under Myopic Behavior [58.75758600464338]
We study social learning dynamics motivated by reviews on online platforms.
Agents collectively follow a simple multi-armed bandit protocol, but each agent acts myopically, without regards to exploration.
We derive stark learning failures for any such behavior, and provide matching positive results.
arXiv Detail & Related papers (2023-02-15T01:57:57Z) - Bayesian decision-making under misspecified priors with applications to
meta-learning [64.38020203019013]
Thompson sampling and other sequential decision-making algorithms are popular approaches to tackle explore/exploit trade-offs in contextual bandits.
We show that performance degrades gracefully with misspecified priors.
arXiv Detail & Related papers (2021-07-03T23:17:26Z) - Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit
Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set.
We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set.
Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z) - Combinatorial Pure Exploration with Full-bandit Feedback and Beyond:
Solving Combinatorial Optimization under Uncertainty with Limited Observation [70.41056265629815]
When developing an algorithm for optimization, it is commonly assumed that parameters such as edge weights are exactly known as inputs.
In this article, we review recently proposed techniques for pure exploration problems with limited feedback.
arXiv Detail & Related papers (2020-12-31T12:40:52Z) - Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous
Feedback [32.62857394584907]
We study the multi-armed bandit (MAB) problem with composite and anonymous feedback.
We propose adaptive algorithms for both the adversarial and non- adversarial cases.
arXiv Detail & Related papers (2020-12-13T12:25:41Z) - The Price of Incentivizing Exploration: A Characterization via Thompson
Sampling and Sample Complexity [83.81297078039836]
We consider incentivized exploration: a version of multi-armed bandits where the choice of arms is controlled by self-interested agents.
We focus on the price of incentives: the loss in performance, broadly construed, incurred for the sake of incentive-compatibility.
arXiv Detail & Related papers (2020-02-03T04:58:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.