Combinatorial Causal Bandits
- URL: http://arxiv.org/abs/2206.01995v1
- Date: Sat, 4 Jun 2022 14:14:58 GMT
- Title: Combinatorial Causal Bandits
- Authors: Shi Feng and Wei Chen
- Abstract summary: In causal bandits, the learning agent chooses at most $K$ variables in each round to intervene, with the goal of minimizing expected regret on the target variable $Y$.
We study under the context of binary generalized linear models (BGLMs) with a succinct parametric representation of the causal models.
We present the algorithm BGLM-OFU for Markovian BGLMs based on the maximum likelihood estimation method, and show that it achieves $O(sqrtTlog T)$ regret, where $T$ is the time horizon.
- Score: 25.012065471684025
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In combinatorial causal bandits (CCB), the learning agent chooses at most $K$
variables in each round to intervene, collects feedback from the observed
variables, with the goal of minimizing expected regret on the target variable
$Y$. Different from all prior studies on causal bandits, CCB needs to deal with
exponentially large action space. We study under the context of binary
generalized linear models (BGLMs) with a succinct parametric representation of
the causal models. We present the algorithm BGLM-OFU for Markovian BGLMs (i.e.
no hidden variables) based on the maximum likelihood estimation method, and
show that it achieves $O(\sqrt{T}\log T)$ regret, where $T$ is the time
horizon. For the special case of linear models with hidden variables, we apply
causal inference techniques such as the do-calculus to convert the original
model into a Markovian model, and then show that our BGLM-OFU algorithm and
another algorithm based on the linear regression both solve such linear models
with hidden variables. Our novelty includes (a) considering the combinatorial
intervention action space, (b) considering general causal models including ones
with hidden variables, (c) integrating and adapting techniques from diverse
studies such as generalized linear bandits and online influence maximization,
and (d) not relying on unrealistic assumptions such as knowing the joint
distribution of the parents of $Y$ under all interventions used in some prior
studies.
Related papers
- Online Clustering of Bandits with Misspecified User Models [42.56440072468658]
Contextual linear bandit is an online learning problem where given arm features, a learning agent selects an arm at each round to maximize the cumulative rewards in the long run.
A line of works, called the clustering of bandits (CB), utilize the collaborative effect over user preferences and have shown significant improvements over classic linear bandit algorithms.
In this paper, we are the first to present the important problem of clustering of bandits with misspecified user models (CBMUM).
We devise two robust CB algorithms, RCLUMB and RSCLUMB, that can accommodate the inaccurate user preference estimations and erroneous clustering caused by model misspecifications.
arXiv Detail & Related papers (2023-10-04T10:40:50Z) - Anytime Model Selection in Linear Bandits [61.97047189786905]
We develop ALEXP, which has an exponentially improved dependence on $M$ for its regret.
Our approach utilizes a novel time-uniform analysis of the Lasso, establishing a new connection between online learning and high-dimensional statistics.
arXiv Detail & Related papers (2023-07-24T15:44:30Z) - Combinatorial Causal Bandits without Graph Skeleton [12.615590470530227]
We study the CCB problem without the graph structure on binary general causal models and BGLMs.
We design a regret algorithm for BGLMs even without the graph skeleton and show that it still achieves $O(sqrtTln T)$ expected regret.
We propose another algorithm with $O(Tfrac23ln T)$ regret to remove the weight gap assumption.
arXiv Detail & Related papers (2023-01-31T03:45:17Z) - Pure Exploration of Causal Bandits [9.77519365079468]
Causal bandit problem integrates causal inference with multi-armed bandits.
Online learning task: given a causal graph with unknown causal inference distributions, we can choose to either intervene one variable or do no intervention.
We provide first gap-dependent fully adaptive pure exploration algorithms on three types of causal models.
arXiv Detail & Related papers (2022-06-16T02:19:37Z) - A Relational Intervention Approach for Unsupervised Dynamics
Generalization in Model-Based Reinforcement Learning [113.75991721607174]
We introduce an interventional prediction module to estimate the probability of two estimated $hatz_i, hatz_j$ belonging to the same environment.
We empirically show that $hatZ$ estimated by our method enjoy less redundant information than previous methods.
arXiv Detail & Related papers (2022-06-09T15:01:36Z) - Exponential Family Model-Based Reinforcement Learning via Score Matching [97.31477125728844]
We propose an optimistic model-based algorithm, dubbed SMRL, for finitehorizon episodic reinforcement learning (RL)
SMRL uses score matching, an unnormalized density estimation technique that enables efficient estimation of the model parameter by ridge regression.
arXiv Detail & Related papers (2021-12-28T15:51:07Z) - Causal Inference Despite Limited Global Confounding via Mixture Models [4.721845865189578]
A finite $k$-mixture of such models is graphically represented by a larger graph.
We give the first algorithm to learn mixtures of non-empty DAGs.
arXiv Detail & Related papers (2021-12-22T01:04:50Z) - Universal and data-adaptive algorithms for model selection in linear
contextual bandits [52.47796554359261]
We consider the simplest non-trivial instance of model-selection: distinguishing a simple multi-armed bandit problem from a linear contextual bandit problem.
We introduce new algorithms that explore in a data-adaptive manner and provide guarantees of the form $mathcalO(dalpha T1- alpha)$.
Our approach extends to model selection among nested linear contextual bandits under some additional assumptions.
arXiv Detail & Related papers (2021-11-08T18:05:35Z) - Learning Gaussian Mixtures with Generalised Linear Models: Precise
Asymptotics in High-dimensions [79.35722941720734]
Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks.
We prove exacts characterising the estimator in high-dimensions via empirical risk minimisation.
We discuss how our theory can be applied beyond the scope of synthetic data.
arXiv Detail & Related papers (2021-06-07T16:53:56Z) - The Generalized Lasso with Nonlinear Observations and Generative Priors [63.541900026673055]
We make the assumption of sub-Gaussian measurements, which is satisfied by a wide range of measurement models.
We show that our result can be extended to the uniform recovery guarantee under the assumption of a so-called local embedding property.
arXiv Detail & Related papers (2020-06-22T16:43:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.