Related papers: Metadata-based Multi-Task Bandits with Bayesian Hierarchical Models

Metadata-based Multi-Task Bandits with Bayesian Hierarchical Models

URL: http://arxiv.org/abs/2108.06422v1
Date: Fri, 13 Aug 2021 22:45:05 GMT
Title: Metadata-based Multi-Task Bandits with Bayesian Hierarchical Models
Authors: Runzhe Wan, Lin Ge, Rui Song
Abstract summary: How to explore efficiently is a central problem in multi-armed bandits. We introduce the metadata-based multi-task bandit problem. We propose to capture task relations through the lens of Bayesian hierarchical models.
Score: 7.458639397686894
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: How to explore efficiently is a central problem in multi-armed bandits. In this paper, we introduce the metadata-based multi-task bandit problem, where the agent needs to solve a large number of related multi-armed bandit tasks and can leverage some task-specific features (i.e., metadata) to share knowledge across tasks. As a general framework, we propose to capture task relations through the lens of Bayesian hierarchical models, upon which a Thompson sampling algorithm is designed to efficiently learn task relations, share information, and minimize the cumulative regrets. Two concrete examples for Gaussian bandits and Bernoulli bandits are carefully analyzed. The Bayes regret for Gaussian bandits clearly demonstrates the benefits of information sharing with our algorithm. The proposed method is further supported by extensive experiments.

Related papers

Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms. We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z)
Multi-Player Approaches for Dueling Bandits [58.442742345319225]
We show that the direct use of a Follow Your Leader black-box approach matches the lower bound for this setting. We also analyze a message-passing fully distributed approach with a novel Condorcet-winner recommendation protocol.
arXiv Detail & Related papers (2024-05-25T10:25:48Z)
Forced Exploration in Bandit Problems [12.13966146283641]
The multi-armed bandit(MAB) is a classical sequential decision problem. This paper aims to design a multi-armed bandit algorithm that can be implemented without using information about the reward distribution.
arXiv Detail & Related papers (2023-12-12T14:00:29Z)
Multi-task Representation Learning for Pure Exploration in Bilinear Bandits [13.773838574776338]
We study multi-task representation learning for the problem of pure exploration in bilinear bandits. In bilinear bandits, an action takes the form of a pair of arms from two different entity types.
arXiv Detail & Related papers (2023-11-01T06:30:45Z)
Thompson Sampling with Virtual Helping Agents [0.0]
We address the problem of online sequential decision making, i.e., balancing the trade-off between exploiting current knowledge to maximize immediate performance and exploring the new information to gain long-term benefits. We propose two algorithms for the multi-armed bandit problem and provide theoretical bounds on the cumulative regret.
arXiv Detail & Related papers (2022-09-16T23:34:44Z)
Incentivizing Combinatorial Bandit Exploration [87.08827496301839]
Consider a bandit algorithm that recommends actions to self-interested users in a recommendation system. Users are free to choose other actions and need to be incentivized to follow the algorithm's recommendations. While the users prefer to exploit, the algorithm can incentivize them to explore by leveraging the information collected from the previous users.
arXiv Detail & Related papers (2022-06-01T13:46:25Z)
Thompson Sampling on Asymmetric $\alpha$-Stable Bandits [0.0]
Multi-armed bandit problem can optimize the proposed solutions by changing the reward distribution. Thompson Sampling is a common method for solving multi-armed bandit problem.
arXiv Detail & Related papers (2022-03-19T01:55:08Z)
Hierarchical Bayesian Bandits [51.67132887113412]
We analyze a natural hierarchical Thompson sampling algorithm (hierTS) that can be applied to any problem in this class. Our regret bounds hold under many instances of such problems, including when the tasks are solved sequentially or in parallel. Experiments show that the hierarchical structure helps with knowledge sharing among the tasks.
arXiv Detail & Related papers (2021-11-12T20:33:09Z)
MetaKernel: Learning Variational Random Features with Limited Labels [120.90737681252594]
Few-shot learning deals with the fundamental and challenging problem of learning from a few annotated samples, while being able to generalize well on new tasks. We propose meta-learning kernels with random Fourier features for few-shot learning, we call Meta Kernel.
arXiv Detail & Related papers (2021-05-08T21:24:09Z)
Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set. We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set. Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z)
Adaptive Task Sampling for Meta-Learning [79.61146834134459]
Key idea of meta-learning for few-shot classification is to mimic the few-shot situations faced at test time. We propose an adaptive task sampling method to improve the generalization performance.
arXiv Detail & Related papers (2020-07-17T03:15:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.