Related papers: Hierarchical Bayesian Bandits

Hierarchical Bayesian Bandits

URL: http://arxiv.org/abs/2111.06929v1
Date: Fri, 12 Nov 2021 20:33:09 GMT
Title: Hierarchical Bayesian Bandits
Authors: Joey Hong and Branislav Kveton and Manzil Zaheer and Mohammad Ghavamzadeh
Abstract summary: We analyze a natural hierarchical Thompson sampling algorithm (hierTS) that can be applied to any problem in this class. Our regret bounds hold under many instances of such problems, including when the tasks are solved sequentially or in parallel. Experiments show that the hierarchical structure helps with knowledge sharing among the tasks.
Score: 51.67132887113412
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Meta-, multi-task, and federated learning can be all viewed as solving similar tasks, drawn from an unknown distribution that reflects task similarities. In this work, we provide a unified view of all these problems, as learning to act in a hierarchical Bayesian bandit. We analyze a natural hierarchical Thompson sampling algorithm (hierTS) that can be applied to any problem in this class. Our regret bounds hold under many instances of such problems, including when the tasks are solved sequentially or in parallel; and capture the structure of the problems, such that the regret decreases with the width of the task prior. Our proofs rely on novel total variance decompositions, which can be applied to other graphical model structures. Finally, our theory is complemented by experiments, which show that the hierarchical structure helps with knowledge sharing among the tasks. This confirms that hierarchical Bayesian bandits are a universal and statistically-efficient tool for learning to act with similar bandit tasks.

Related papers

Multi Agent Reinforcement Learning for Sequential Satellite Assignment Problems [5.896440476510869]
Assignment problems are a classic optimization problem in which a group of agents is assigned to a group of tasks. In many modern-day applications such as satellite, power grids, and mobile robot scheduling, assignment problems unfold over time. We apply multi-agent reinforcement learning to this problem, learning the value of assignments by bootstrapping from a known RL-time greedy solver. We demonstrate that our algorithm is theoretically justified and avoids pitfalls experienced by other algorithms in this setting.
arXiv Detail & Related papers (2024-12-20T05:10:34Z)
Multi-Task Off-Policy Learning from Bandit Feedback [54.96011624223482]
We propose a hierarchical off-policy optimization algorithm (HierOPO), which estimates the parameters of the hierarchical model and then acts pessimistically with respect to them. We prove per-task bounds on the suboptimality of the learned policies, which show a clear improvement over not using the hierarchical model. Our theoretical and empirical results show a clear advantage of using the hierarchy over solving each task independently.
arXiv Detail & Related papers (2022-12-09T08:26:27Z)
Fast Inference and Transfer of Compositional Task Structures for Few-shot Task Generalization [101.72755769194677]
We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph. Our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks. Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks.
arXiv Detail & Related papers (2022-05-25T10:44:25Z)
Deep Hierarchy in Bandits [51.22833900944146]
Mean rewards of actions are often correlated. To maximize statistical efficiency, it is important to leverage these correlations when learning. We formulate a bandit variant of this problem where the correlations of mean action rewards are represented by a hierarchical Bayesian model.
arXiv Detail & Related papers (2022-02-03T08:15:53Z)
Metadata-based Multi-Task Bandits with Bayesian Hierarchical Models [7.458639397686894]
How to explore efficiently is a central problem in multi-armed bandits. We introduce the metadata-based multi-task bandit problem. We propose to capture task relations through the lens of Bayesian hierarchical models.
arXiv Detail & Related papers (2021-08-13T22:45:05Z)
Learning Task Decomposition with Ordered Memory Policy Network [73.3813423684999]
We propose Ordered Memory Policy Network (OMPN) to discover subtask hierarchy by learning from demonstration. OMPN can be applied to partially observable environments and still achieve higher task decomposition performance. Our visualization confirms that the subtask hierarchy can emerge in our model.
arXiv Detail & Related papers (2021-03-19T18:13:35Z)
Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics [24.57617154267565]
We investigate how forgetting affects representations in neural network models. We find that deeper layers are disproportionately the source of forgetting. We also introduce a novel CIFAR-100 based task approximating realistic input distribution shift.
arXiv Detail & Related papers (2020-07-14T23:31:14Z)
Hierarchical Reinforcement Learning as a Model of Human Task Interleaving [60.95424607008241]
We develop a hierarchical model of supervisory control driven by reinforcement learning. The model reproduces known empirical effects of task interleaving. The results support hierarchical RL as a plausible model of task interleaving.
arXiv Detail & Related papers (2020-01-04T17:53:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.