Deep Hierarchy in Bandits
- URL: http://arxiv.org/abs/2202.01454v1
- Date: Thu, 3 Feb 2022 08:15:53 GMT
- Title: Deep Hierarchy in Bandits
- Authors: Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, and
Mohammad Ghavamzadeh
- Abstract summary: Mean rewards of actions are often correlated.
To maximize statistical efficiency, it is important to leverage these correlations when learning.
We formulate a bandit variant of this problem where the correlations of mean action rewards are represented by a hierarchical Bayesian model.
- Score: 51.22833900944146
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mean rewards of actions are often correlated. The form of these correlations
may be complex and unknown a priori, such as the preferences of a user for
recommended products and their categories. To maximize statistical efficiency,
it is important to leverage these correlations when learning. We formulate a
bandit variant of this problem where the correlations of mean action rewards
are represented by a hierarchical Bayesian model with latent variables. Since
the hierarchy can have multiple layers, we call it deep. We propose a
hierarchical Thompson sampling algorithm (HierTS) for this problem, and show
how to implement it efficiently for Gaussian hierarchies. The efficient
implementation is possible due to a novel exact hierarchical representation of
the posterior, which itself is of independent interest. We use this exact
posterior to analyze the Bayes regret of HierTS in Gaussian bandits. Our
analysis reflects the structure of the problem, that the regret decreases with
the prior width, and also shows that hierarchies reduce the regret by
non-constant factors in the number of actions. We confirm these theoretical
findings empirically, in both synthetic and real-world experiments.
Related papers
- Reducing the dimensionality and granularity in hierarchical categorical variables [2.089191490381739]
We propose a methodology to obtain a reduced representation of a hierarchical categorical variable.
We show how entity embedding can be applied in a hierarchical setting.
We apply our methodology on a real dataset and find that the reduced hierarchy is an improvement over the original hierarchical structure.
arXiv Detail & Related papers (2024-03-06T11:09:36Z) - Discouraging posterior collapse in hierarchical Variational Autoencoders
using context [19.47169312443202]
There is a consensus that the top-down hierarchical VAEs allow effective learning of deep latent structures and avoid problems like posterior collapse.
Here, we show that this is not necessarily the case, and the problem of collapsing posteriors remains.
We propose a deep hierarchical VAE with a context on top. Specifically, we use a Discrete Cosine Transform to obtain the last latent variable.
arXiv Detail & Related papers (2023-02-20T13:44:47Z) - Multi-Task Off-Policy Learning from Bandit Feedback [54.96011624223482]
We propose a hierarchical off-policy optimization algorithm (HierOPO), which estimates the parameters of the hierarchical model and then acts pessimistically with respect to them.
We prove per-task bounds on the suboptimality of the learned policies, which show a clear improvement over not using the hierarchical model.
Our theoretical and empirical results show a clear advantage of using the hierarchy over solving each task independently.
arXiv Detail & Related papers (2022-12-09T08:26:27Z) - Dual Instrumental Method for Confounded Kernelized Bandits [0.0]
The contextual bandit problem is a framework with wide applications in various fields.
We propose a confounded bandit problem where the noise becomes a latent confounder that affects both contexts and rewards.
We show that a dual instrumental variable regression can correctly identify the true reward function.
arXiv Detail & Related papers (2022-09-07T15:25:57Z) - Generalizing Hierarchical Bayesian Bandits [14.986031916712108]
A contextual bandit is a popular and practical framework for online learning to act under uncertainty.
In this work, we introduce a general framework for capturing such correlations through a two-level graphical model.
We propose a Thompson sampling algorithm G-HierTS that uses this structure to explore efficiently and bound its Bayes regret.
arXiv Detail & Related papers (2022-05-30T14:17:56Z) - Hierarchical Bayesian Bandits [51.67132887113412]
We analyze a natural hierarchical Thompson sampling algorithm (hierTS) that can be applied to any problem in this class.
Our regret bounds hold under many instances of such problems, including when the tasks are solved sequentially or in parallel.
Experiments show that the hierarchical structure helps with knowledge sharing among the tasks.
arXiv Detail & Related papers (2021-11-12T20:33:09Z) - Deconfounding Scores: Feature Representations for Causal Effect
Estimation with Weak Overlap [140.98628848491146]
We introduce deconfounding scores, which induce better overlap without biasing the target of estimation.
We show that deconfounding scores satisfy a zero-covariance condition that is identifiable in observed data.
In particular, we show that this technique could be an attractive alternative to standard regularizations.
arXiv Detail & Related papers (2021-04-12T18:50:11Z) - Causal Expectation-Maximisation [70.45873402967297]
We show that causal inference is NP-hard even in models characterised by polytree-shaped graphs.
We introduce the causal EM algorithm to reconstruct the uncertainty about the latent variables from data about categorical manifest variables.
We argue that there appears to be an unnoticed limitation to the trending idea that counterfactual bounds can often be computed without knowledge of the structural equations.
arXiv Detail & Related papers (2020-11-04T10:25:13Z) - Gaussian MRF Covariance Modeling for Efficient Black-Box Adversarial
Attacks [86.88061841975482]
We study the problem of generating adversarial examples in a black-box setting, where we only have access to a zeroth order oracle.
We use this setting to find fast one-step adversarial attacks, akin to a black-box version of the Fast Gradient Sign Method(FGSM)
We show that the method uses fewer queries and achieves higher attack success rates than the current state of the art.
arXiv Detail & Related papers (2020-10-08T18:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.