Hierarchical Bayesian Bandits
- URL: http://arxiv.org/abs/2111.06929v1
- Date: Fri, 12 Nov 2021 20:33:09 GMT
- Title: Hierarchical Bayesian Bandits
- Authors: Joey Hong and Branislav Kveton and Manzil Zaheer and Mohammad
Ghavamzadeh
- Abstract summary: We analyze a natural hierarchical Thompson sampling algorithm (hierTS) that can be applied to any problem in this class.
Our regret bounds hold under many instances of such problems, including when the tasks are solved sequentially or in parallel.
Experiments show that the hierarchical structure helps with knowledge sharing among the tasks.
- Score: 51.67132887113412
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Meta-, multi-task, and federated learning can be all viewed as solving
similar tasks, drawn from an unknown distribution that reflects task
similarities. In this work, we provide a unified view of all these problems, as
learning to act in a hierarchical Bayesian bandit. We analyze a natural
hierarchical Thompson sampling algorithm (hierTS) that can be applied to any
problem in this class. Our regret bounds hold under many instances of such
problems, including when the tasks are solved sequentially or in parallel; and
capture the structure of the problems, such that the regret decreases with the
width of the task prior. Our proofs rely on novel total variance
decompositions, which can be applied to other graphical model structures.
Finally, our theory is complemented by experiments, which show that the
hierarchical structure helps with knowledge sharing among the tasks. This
confirms that hierarchical Bayesian bandits are a universal and
statistically-efficient tool for learning to act with similar bandit tasks.
Related papers
- Multi-Task Off-Policy Learning from Bandit Feedback [54.96011624223482]
We propose a hierarchical off-policy optimization algorithm (HierOPO), which estimates the parameters of the hierarchical model and then acts pessimistically with respect to them.
We prove per-task bounds on the suboptimality of the learned policies, which show a clear improvement over not using the hierarchical model.
Our theoretical and empirical results show a clear advantage of using the hierarchy over solving each task independently.
arXiv Detail & Related papers (2022-12-09T08:26:27Z) - Fast Inference and Transfer of Compositional Task Structures for
Few-shot Task Generalization [101.72755769194677]
We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph.
Our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks.
Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks.
arXiv Detail & Related papers (2022-05-25T10:44:25Z) - Deep Hierarchy in Bandits [51.22833900944146]
Mean rewards of actions are often correlated.
To maximize statistical efficiency, it is important to leverage these correlations when learning.
We formulate a bandit variant of this problem where the correlations of mean action rewards are represented by a hierarchical Bayesian model.
arXiv Detail & Related papers (2022-02-03T08:15:53Z) - Metadata-based Multi-Task Bandits with Bayesian Hierarchical Models [7.458639397686894]
How to explore efficiently is a central problem in multi-armed bandits.
We introduce the metadata-based multi-task bandit problem.
We propose to capture task relations through the lens of Bayesian hierarchical models.
arXiv Detail & Related papers (2021-08-13T22:45:05Z) - Learning Task Decomposition with Ordered Memory Policy Network [73.3813423684999]
We propose Ordered Memory Policy Network (OMPN) to discover subtask hierarchy by learning from demonstration.
OMPN can be applied to partially observable environments and still achieve higher task decomposition performance.
Our visualization confirms that the subtask hierarchy can emerge in our model.
arXiv Detail & Related papers (2021-03-19T18:13:35Z) - Anatomy of Catastrophic Forgetting: Hidden Representations and Task
Semantics [24.57617154267565]
We investigate how forgetting affects representations in neural network models.
We find that deeper layers are disproportionately the source of forgetting.
We also introduce a novel CIFAR-100 based task approximating realistic input distribution shift.
arXiv Detail & Related papers (2020-07-14T23:31:14Z) - Hierarchical Reinforcement Learning as a Model of Human Task
Interleaving [60.95424607008241]
We develop a hierarchical model of supervisory control driven by reinforcement learning.
The model reproduces known empirical effects of task interleaving.
The results support hierarchical RL as a plausible model of task interleaving.
arXiv Detail & Related papers (2020-01-04T17:53:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.