Efficient Online Estimation of Causal Effects by Deciding What to
Observe
- URL: http://arxiv.org/abs/2108.09265v1
- Date: Fri, 20 Aug 2021 17:00:56 GMT
- Title: Efficient Online Estimation of Causal Effects by Deciding What to
Observe
- Authors: Shantanu Gupta, Zachary C. Lipton, David Childers
- Abstract summary: We aim to estimate any functional of a probabilistic model (e.g., a causal effect) as efficiently as possible, by deciding, at each time, which data source to query.
We propose online moment selection (OMS), a framework in which structural assumptions are encoded as moment conditions.
Our algorithms balance exploration with choosing the best action as suggested by current estimates of the moments.
- Score: 26.222870185443913
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Researchers often face data fusion problems, where multiple data sources are
available, each capturing a distinct subset of variables. While problem
formulations typically take the data as given, in practice, data acquisition
can be an ongoing process. In this paper, we aim to estimate any functional of
a probabilistic model (e.g., a causal effect) as efficiently as possible, by
deciding, at each time, which data source to query. We propose online moment
selection (OMS), a framework in which structural assumptions are encoded as
moment conditions. The optimal action at each step depends, in part, on the
very moments that identify the functional of interest. Our algorithms balance
exploration with choosing the best action as suggested by current estimates of
the moments. We propose two selection strategies: (1) explore-then-commit
(OMS-ETC) and (2) explore-then-greedy (OMS-ETG), proving that both achieve zero
asymptotic regret as assessed by MSE. We instantiate our setup for average
treatment effect estimation, where structural assumptions are given by a causal
graph and data sources may include subsets of mediators, confounders, and
instrumental variables.
Related papers
- Online Data Collection for Efficient Semiparametric Inference [41.49486724979923]
We present two online data collection policies, Explore-then-Commit and Explore-then-Greedy, that use the parameter estimates at a given time to optimally allocate the remaining budget in the future steps.
We prove that both policies achieve zero regret (assessed by MSE) relative to an oracle policy.
arXiv Detail & Related papers (2024-11-05T15:40:53Z) - Detecting and Identifying Selection Structure in Sequential Data [53.24493902162797]
We argue that the selective inclusion of data points based on latent objectives is common in practical situations, such as music sequences.
We show that selection structure is identifiable without any parametric assumptions or interventional experiments.
We also propose a provably correct algorithm to detect and identify selection structures as well as other types of dependencies.
arXiv Detail & Related papers (2024-06-29T20:56:34Z) - Scalable Decentralized Algorithms for Online Personalized Mean Estimation [12.002609934938224]
This study focuses on a simplified version of the overarching problem, where each agent collects samples from a real-valued distribution over time to estimate its mean.
We introduce two collaborative mean estimation algorithms: one draws inspiration from belief propagation, while the other employs a consensus-based approach.
arXiv Detail & Related papers (2024-02-20T08:30:46Z) - Approximating Counterfactual Bounds while Fusing Observational, Biased
and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies.
We show that the likelihood of the available data has no local maxima.
We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z) - Identification and multiply robust estimation in causal mediation analysis across principal strata [7.801213477601286]
We consider assessing causal mediation in the presence of a post-treatment event.
We derive the efficient influence function for each mediation estimand, which motivates a set of multiply robust estimators for inference.
arXiv Detail & Related papers (2023-04-20T00:39:20Z) - PeSOTIF: a Challenging Visual Dataset for Perception SOTIF Problems in
Long-tail Traffic Scenarios [12.17821905210185]
This paper provides a high-quality diverse dataset of long-tail traffic scenarios collected from multiple resources.
Considering the development of probabilistic object detection (POD), this dataset marks trigger sources that may cause perception SOTIF problems in the scenarios as key objects.
To demonstrate how to use this dataset for SOTIF research, this paper further quantifies the perception SOTIF entropy to confirm whether a scenario is unknown and unsafe for a perception system.
arXiv Detail & Related papers (2022-11-07T10:07:30Z) - Multi-Source Causal Inference Using Control Variates [81.57072928775509]
We propose a general algorithm to estimate causal effects from emphmultiple data sources.
We show theoretically that this reduces the variance of the ATE estimate.
We apply this framework to inference from observational data under an outcome selection bias.
arXiv Detail & Related papers (2021-03-30T21:20:51Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z) - Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments.
We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data.
Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z) - Improving Multi-Turn Response Selection Models with Complementary
Last-Utterance Selection by Instance Weighting [84.9716460244444]
We consider utilizing the underlying correlation in the data resource itself to derive different kinds of supervision signals.
We conduct extensive experiments in two public datasets and obtain significant improvement in both datasets.
arXiv Detail & Related papers (2020-02-18T06:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.