Related papers: Deep Hierarchy in Bandits

Deep Hierarchy in Bandits

URL: http://arxiv.org/abs/2202.01454v1
Date: Thu, 3 Feb 2022 08:15:53 GMT
Title: Deep Hierarchy in Bandits
Authors: Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, and Mohammad Ghavamzadeh
Abstract summary: Mean rewards of actions are often correlated. To maximize statistical efficiency, it is important to leverage these correlations when learning. We formulate a bandit variant of this problem where the correlations of mean action rewards are represented by a hierarchical Bayesian model.
Score: 51.22833900944146
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mean rewards of actions are often correlated. The form of these correlations may be complex and unknown a priori, such as the preferences of a user for recommended products and their categories. To maximize statistical efficiency, it is important to leverage these correlations when learning. We formulate a bandit variant of this problem where the correlations of mean action rewards are represented by a hierarchical Bayesian model with latent variables. Since the hierarchy can have multiple layers, we call it deep. We propose a hierarchical Thompson sampling algorithm (HierTS) for this problem, and show how to implement it efficiently for Gaussian hierarchies. The efficient implementation is possible due to a novel exact hierarchical representation of the posterior, which itself is of independent interest. We use this exact posterior to analyze the Bayes regret of HierTS in Gaussian bandits. Our analysis reflects the structure of the problem, that the regret decreases with the prior width, and also shows that hierarchies reduce the regret by non-constant factors in the number of actions. We confirm these theoretical findings empirically, in both synthetic and real-world experiments.

Related papers

Reducing the dimensionality and granularity in hierarchical categorical variables [2.089191490381739]
We propose a methodology to obtain a reduced representation of a hierarchical categorical variable. We show how entity embedding can be applied in a hierarchical setting. We apply our methodology on a real dataset and find that the reduced hierarchy is an improvement over the original hierarchical structure.
arXiv Detail & Related papers (2024-03-06T11:09:36Z)
Discouraging posterior collapse in hierarchical Variational Autoencoders using context [19.47169312443202]
There is a consensus that the top-down hierarchical VAEs allow effective learning of deep latent structures and avoid problems like posterior collapse. Here, we show that this is not necessarily the case, and the problem of collapsing posteriors remains. We propose a deep hierarchical VAE with a context on top. Specifically, we use a Discrete Cosine Transform to obtain the last latent variable.
arXiv Detail & Related papers (2023-02-20T13:44:47Z)
Multi-Task Off-Policy Learning from Bandit Feedback [54.96011624223482]
We propose a hierarchical off-policy optimization algorithm (HierOPO), which estimates the parameters of the hierarchical model and then acts pessimistically with respect to them. We prove per-task bounds on the suboptimality of the learned policies, which show a clear improvement over not using the hierarchical model. Our theoretical and empirical results show a clear advantage of using the hierarchy over solving each task independently.
arXiv Detail & Related papers (2022-12-09T08:26:27Z)
Dual Instrumental Method for Confounded Kernelized Bandits [0.0]
The contextual bandit problem is a framework with wide applications in various fields. We propose a confounded bandit problem where the noise becomes a latent confounder that affects both contexts and rewards. We show that a dual instrumental variable regression can correctly identify the true reward function.
arXiv Detail & Related papers (2022-09-07T15:25:57Z)
Generalizing Hierarchical Bayesian Bandits [14.986031916712108]
A contextual bandit is a popular and practical framework for online learning to act under uncertainty. In this work, we introduce a general framework for capturing such correlations through a two-level graphical model. We propose a Thompson sampling algorithm G-HierTS that uses this structure to explore efficiently and bound its Bayes regret.
arXiv Detail & Related papers (2022-05-30T14:17:56Z)
Hierarchical Bayesian Bandits [51.67132887113412]
We analyze a natural hierarchical Thompson sampling algorithm (hierTS) that can be applied to any problem in this class. Our regret bounds hold under many instances of such problems, including when the tasks are solved sequentially or in parallel. Experiments show that the hierarchical structure helps with knowledge sharing among the tasks.
arXiv Detail & Related papers (2021-11-12T20:33:09Z)
Deconfounding Scores: Feature Representations for Causal Effect Estimation with Weak Overlap [140.98628848491146]
We introduce deconfounding scores, which induce better overlap without biasing the target of estimation. We show that deconfounding scores satisfy a zero-covariance condition that is identifiable in observed data. In particular, we show that this technique could be an attractive alternative to standard regularizations.
arXiv Detail & Related papers (2021-04-12T18:50:11Z)
Causal Expectation-Maximisation [70.45873402967297]
We show that causal inference is NP-hard even in models characterised by polytree-shaped graphs. We introduce the causal EM algorithm to reconstruct the uncertainty about the latent variables from data about categorical manifest variables. We argue that there appears to be an unnoticed limitation to the trending idea that counterfactual bounds can often be computed without knowledge of the structural equations.
arXiv Detail & Related papers (2020-11-04T10:25:13Z)
Gaussian MRF Covariance Modeling for Efficient Black-Box Adversarial Attacks [86.88061841975482]
We study the problem of generating adversarial examples in a black-box setting, where we only have access to a zeroth order oracle. We use this setting to find fast one-step adversarial attacks, akin to a black-box version of the Fast Gradient Sign Method(FGSM) We show that the method uses fewer queries and achieves higher attack success rates than the current state of the art.
arXiv Detail & Related papers (2020-10-08T18:36:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.