Related papers: Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems

Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems

URL: http://arxiv.org/abs/2007.04915v1
Date: Thu, 9 Jul 2020 16:25:40 GMT
Title: Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems
Authors: Tong Yu, Branislav Kveton, Zheng Wen, Ruiyi Zhang, Ole J. Mengshoel
Abstract summary: Our framework captures complex statistical dependencies between actions, latent variables, and observations. We develop novel online learning algorithms that learn to act efficiently in our models.
Score: 40.957688390621385
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a novel framework for structured bandits, which we call an influence diagram bandit. Our framework captures complex statistical dependencies between actions, latent variables, and observations; and thus unifies and extends many existing models, such as combinatorial semi-bandits, cascading bandits, and low-rank bandits. We develop novel online learning algorithms that learn to act efficiently in our models. The key idea is to track a structured posterior distribution of model parameters, either exactly or approximately. To act, we sample model parameters from their posterior and then use the structure of the influence diagram to find the most optimistic action under the sampled parameters. We empirically evaluate our algorithms in three structured bandit problems, and show that they perform as well as or better than problem-specific state-of-the-art baselines.

Related papers

Discrete Choice Multi-Armed Bandits [0.0]
This paper establishes a connection between a category of discrete choice models and the realms of online learning and multiarmed bandit algorithms. We furnish sublinear regret bounds for a comprehensive family of algorithms, encompassing the Exp3 algorithm as a particular case. We introduce a novel family of adversarial multiarmed bandit algorithms, drawing inspiration from the generalized nested logit models.
arXiv Detail & Related papers (2023-10-01T03:41:04Z)
Generalizing Hierarchical Bayesian Bandits [14.986031916712108]
A contextual bandit is a popular and practical framework for online learning to act under uncertainty. In this work, we introduce a general framework for capturing such correlations through a two-level graphical model. We propose a Thompson sampling algorithm G-HierTS that uses this structure to explore efficiently and bound its Bayes regret.
arXiv Detail & Related papers (2022-05-30T14:17:56Z)
Towards Scalable and Robust Structured Bandits: A Meta-Learning Framework [11.778985277618354]
We propose a unified meta-learning framework for a class of structured bandit problems where the parameter space can be factorized to item-level. The novel bandit algorithm is general to be applied to many popular problems,scalable to the huge parameter and action spaces, and robust to the specification of the generalization model.
arXiv Detail & Related papers (2022-02-26T20:54:55Z)
Deep Hierarchy in Bandits [51.22833900944146]
Mean rewards of actions are often correlated. To maximize statistical efficiency, it is important to leverage these correlations when learning. We formulate a bandit variant of this problem where the correlations of mean action rewards are represented by a hierarchical Bayesian model.
arXiv Detail & Related papers (2022-02-03T08:15:53Z)
Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits [1.8275108630751844]
We propose a Thompson Sampling algorithm for partially observable contextual multi-armed bandits. We show that the regret of the presented policy scales logarithmically with time and the number of arms, and linearly with the dimension.
arXiv Detail & Related papers (2021-10-23T08:51:49Z)
On Learning to Rank Long Sequences with Contextual Bandits [17.97356309346139]
We introduce a variant of the cascading bandit model that considers flexible length sequences with varying rewards and losses. Our analysis delivers tight regret bounds which, when specialized to vanilla cascading bandits, results in sharper guarantees than previously available in the literature.
arXiv Detail & Related papers (2021-06-07T12:16:34Z)
Bias-Robust Bayesian Optimization via Dueling Bandit [57.82422045437126]
We consider Bayesian optimization in settings where observations can be adversarially biased. We propose a novel approach for dueling bandits based on information-directed sampling (IDS) Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees.
arXiv Detail & Related papers (2021-05-25T10:08:41Z)
Characterizing Fairness Over the Set of Good Models Under Selective Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance. We provide tractable algorithms to compute the range of attainable group-level predictive disparities. We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z)
Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference. We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z)
Latent Bandits Revisited [55.88616813182679]
A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state. We propose general algorithms for this setting, based on both upper confidence bounds (UCBs) and Thompson sampling. We provide a unified theoretical analysis of our algorithms, which have lower regret than classic bandit policies when the number of latent states is smaller than actions.
arXiv Detail & Related papers (2020-06-15T19:24:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.