Efficient Online Estimation of Causal Effects by Deciding What to
Observe
- URL: http://arxiv.org/abs/2108.09265v1
- Date: Fri, 20 Aug 2021 17:00:56 GMT
- Title: Efficient Online Estimation of Causal Effects by Deciding What to
Observe
- Authors: Shantanu Gupta, Zachary C. Lipton, David Childers
- Abstract summary: We aim to estimate any functional of a probabilistic model (e.g., a causal effect) as efficiently as possible, by deciding, at each time, which data source to query.
We propose online moment selection (OMS), a framework in which structural assumptions are encoded as moment conditions.
Our algorithms balance exploration with choosing the best action as suggested by current estimates of the moments.
- Score: 26.222870185443913
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Researchers often face data fusion problems, where multiple data sources are
available, each capturing a distinct subset of variables. While problem
formulations typically take the data as given, in practice, data acquisition
can be an ongoing process. In this paper, we aim to estimate any functional of
a probabilistic model (e.g., a causal effect) as efficiently as possible, by
deciding, at each time, which data source to query. We propose online moment
selection (OMS), a framework in which structural assumptions are encoded as
moment conditions. The optimal action at each step depends, in part, on the
very moments that identify the functional of interest. Our algorithms balance
exploration with choosing the best action as suggested by current estimates of
the moments. We propose two selection strategies: (1) explore-then-commit
(OMS-ETC) and (2) explore-then-greedy (OMS-ETG), proving that both achieve zero
asymptotic regret as assessed by MSE. We instantiate our setup for average
treatment effect estimation, where structural assumptions are given by a causal
graph and data sources may include subsets of mediators, confounders, and
instrumental variables.
Related papers
- Detecting and Identifying Selection Structure in Sequential Data [53.24493902162797]
We argue that the selective inclusion of data points based on latent objectives is common in practical situations, such as music sequences.
We show that selection structure is identifiable without any parametric assumptions or interventional experiments.
We also propose a provably correct algorithm to detect and identify selection structures as well as other types of dependencies.
arXiv Detail & Related papers (2024-06-29T20:56:34Z) - Scalable Decentralized Algorithms for Online Personalized Mean Estimation [12.002609934938224]
This study focuses on a simplified version of the overarching problem, where each agent collects samples from a real-valued distribution over time to estimate its mean.
We introduce two collaborative mean estimation algorithms: one draws inspiration from belief propagation, while the other employs a consensus-based approach.
arXiv Detail & Related papers (2024-02-20T08:30:46Z) - A data-science pipeline to enable the Interpretability of Many-Objective
Feature Selection [0.1474723404975345]
Many-Objective Feature Selection (MOFS) approaches use four or more objectives to determine the relevance of a subset of features in a supervised learning task.
This paper proposes an original methodology to support data scientists in the interpretation and comparison of the MOFS outcome by combining post-processing and visualisation of the set of solutions.
arXiv Detail & Related papers (2023-11-30T17:44:22Z) - Approximating Counterfactual Bounds while Fusing Observational, Biased
and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies.
We show that the likelihood of the available data has no local maxima.
We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z) - PeSOTIF: a Challenging Visual Dataset for Perception SOTIF Problems in
Long-tail Traffic Scenarios [12.17821905210185]
This paper provides a high-quality diverse dataset of long-tail traffic scenarios collected from multiple resources.
Considering the development of probabilistic object detection (POD), this dataset marks trigger sources that may cause perception SOTIF problems in the scenarios as key objects.
To demonstrate how to use this dataset for SOTIF research, this paper further quantifies the perception SOTIF entropy to confirm whether a scenario is unknown and unsafe for a perception system.
arXiv Detail & Related papers (2022-11-07T10:07:30Z) - Multi-Source Causal Inference Using Control Variates [81.57072928775509]
We propose a general algorithm to estimate causal effects from emphmultiple data sources.
We show theoretically that this reduces the variance of the ATE estimate.
We apply this framework to inference from observational data under an outcome selection bias.
arXiv Detail & Related papers (2021-03-30T21:20:51Z) - Sample-Efficient Reinforcement Learning via Counterfactual-Based Data
Augmentation [15.451690870640295]
In some scenarios such as healthcare, usually only few records are available for each patient, impeding the application of currentReinforcement learning algorithms.
We propose a data-efficient RL algorithm that exploits structural causal models (SCMs) to model the state dynamics.
We show that counterfactual outcomes are identifiable under mild conditions and that Q- learning on the counterfactual-based augmented data set converges to the optimal value function.
arXiv Detail & Related papers (2020-12-16T17:21:13Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z) - Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments.
We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data.
Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z) - Improving Multi-Turn Response Selection Models with Complementary
Last-Utterance Selection by Instance Weighting [84.9716460244444]
We consider utilizing the underlying correlation in the data resource itself to derive different kinds of supervision signals.
We conduct extensive experiments in two public datasets and obtain significant improvement in both datasets.
arXiv Detail & Related papers (2020-02-18T06:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.