Bandit Social Learning: Exploration under Myopic Behavior
- URL: http://arxiv.org/abs/2302.07425v4
- Date: Fri, 3 Nov 2023 22:26:50 GMT
- Title: Bandit Social Learning: Exploration under Myopic Behavior
- Authors: Kiarash Banihashem, MohammadTaghi Hajiaghayi, Suho Shin, Aleksandrs
Slivkins
- Abstract summary: We study social learning dynamics motivated by reviews on online platforms.
Agents collectively follow a simple multi-armed bandit protocol, but each agent acts myopically, without regards to exploration.
We derive stark learning failures for any such behavior, and provide matching positive results.
- Score: 58.75758600464338
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study social learning dynamics motivated by reviews on online platforms.
The agents collectively follow a simple multi-armed bandit protocol, but each
agent acts myopically, without regards to exploration. We allow a wide range of
myopic behaviors that are consistent with (parameterized) confidence intervals
for the arms' expected rewards. We derive stark learning failures for any such
behavior, and provide matching positive results. As a special case, we obtain
the first general results on failure of the greedy algorithm in bandits, thus
providing a theoretical foundation for why bandit algorithms should explore.
Related papers
- Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.
We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward.
We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy.
We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z) - Pure Exploration of Causal Bandits [9.77519365079468]
Causal bandit problem integrates causal inference with multi-armed bandits.
Online learning task: given a causal graph with unknown causal inference distributions, we can choose to either intervene one variable or do no intervention.
We provide first gap-dependent fully adaptive pure exploration algorithms on three types of causal models.
arXiv Detail & Related papers (2022-06-16T02:19:37Z) - Incentivizing Combinatorial Bandit Exploration [87.08827496301839]
Consider a bandit algorithm that recommends actions to self-interested users in a recommendation system.
Users are free to choose other actions and need to be incentivized to follow the algorithm's recommendations.
While the users prefer to exploit, the algorithm can incentivize them to explore by leveraging the information collected from the previous users.
arXiv Detail & Related papers (2022-06-01T13:46:25Z) - Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous
Feedback [32.62857394584907]
We study the multi-armed bandit (MAB) problem with composite and anonymous feedback.
We propose adaptive algorithms for both the adversarial and non- adversarial cases.
arXiv Detail & Related papers (2020-12-13T12:25:41Z) - The Combinatorial Multi-Bandit Problem and its Application to Energy
Management [2.236663830879273]
We study a Combinatorial Multi-Bandit Problem motivated by applications in energy systems management.
For our energy management application we propose a range of algorithms that combine exploration principles for multi-arm bandits with mathematical programming.
arXiv Detail & Related papers (2020-10-30T13:42:54Z) - Latent Bandits Revisited [55.88616813182679]
A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state.
We propose general algorithms for this setting, based on both upper confidence bounds (UCBs) and Thompson sampling.
We provide a unified theoretical analysis of our algorithms, which have lower regret than classic bandit policies when the number of latent states is smaller than actions.
arXiv Detail & Related papers (2020-06-15T19:24:02Z) - Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean.
We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.