s-ID: Causal Effect Identification in a Sub-Population
- URL: http://arxiv.org/abs/2309.02281v2
- Date: Mon, 8 Jan 2024 22:31:19 GMT
- Title: s-ID: Causal Effect Identification in a Sub-Population
- Authors: Amir Mohammad Abouei, Ehsan Mokhtarian, Negar Kiyavash
- Abstract summary: We introduce and advocate for a causal inference problem in sub-populations (henceforth called s-ID)
Existing inference problems in sub-populations operate on the premise that the given data originate from the entire population.
We provide necessary and sufficient conditions that must hold in the causal graph for a causal effect in a sub-population to be identifiable from the observational distribution of that sub-population.
- Score: 23.221279036710012
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Causal inference in a sub-population involves identifying the causal effect
of an intervention on a specific subgroup, which is distinguished from the
whole population through the influence of systematic biases in the sampling
process. However, ignoring the subtleties introduced by sub-populations can
either lead to erroneous inference or limit the applicability of existing
methods. We introduce and advocate for a causal inference problem in
sub-populations (henceforth called s-ID), in which we merely have access to
observational data of the targeted sub-population (as opposed to the entire
population). Existing inference problems in sub-populations operate on the
premise that the given data distributions originate from the entire population,
thus, cannot tackle the s-ID problem. To address this gap, we provide necessary
and sufficient conditions that must hold in the causal graph for a causal
effect in a sub-population to be identifiable from the observational
distribution of that sub-population. Given these conditions, we present a sound
and complete algorithm for the s-ID problem.
Related papers
- Boosting Test Performance with Importance Sampling--a Subpopulation Perspective [16.678910111353307]
In this paper, we identify important sampling as a simple yet powerful tool for solving the subpopulation problem.
We provide a new systematic formulation of the subpopulation problem and explicitly identify the assumptions that are not clearly stated in the existing works.
On the application side, we demonstrate a single estimator is enough to solve the subpopulation problem.
arXiv Detail & Related papers (2024-12-17T15:25:24Z) - Learning With Multi-Group Guarantees For Clusterable Subpopulations [14.042643978487453]
A canonical desideratum for prediction problems is that performance guarantees should hold on average over the population.
But what constitutes a meaningful subpopulation?
We take the perspective that relevant subpopulations should be defined with respect to the clusters that naturally emerge from the distribution of individuals.
arXiv Detail & Related papers (2024-10-18T16:38:55Z) - Causal Effect Identification in a Sub-Population with Latent Variables [22.75558589075695]
The s-ID problem seeks to compute a causal effect in a specific sub-population from the observational data pertaining to the same sub population.
In this paper, we consider an extension of the s-ID problem that allows for the presence of latent variables.
We propose a sound algorithm for the s-ID problem with latent variables.
arXiv Detail & Related papers (2024-05-23T13:25:41Z) - Identifiable Latent Neural Causal Models [82.14087963690561]
Causal representation learning seeks to uncover latent, high-level causal representations from low-level observed data.
We determine the types of distribution shifts that do contribute to the identifiability of causal representations.
We translate our findings into a practical algorithm, allowing for the acquisition of reliable latent causal representations.
arXiv Detail & Related papers (2024-03-23T04:13:55Z) - Nonparametric Identifiability of Causal Representations from Unknown
Interventions [63.1354734978244]
We study causal representation learning, the task of inferring latent causal variables and their causal relations from mixtures of the variables.
Our goal is to identify both the ground truth latents and their causal graph up to a set of ambiguities which we show to be irresolvable from interventional data.
arXiv Detail & Related papers (2023-06-01T10:51:58Z) - Learning Probabilities of Causation from Finite Population Data [40.99426447422972]
We propose a machine learning model that helps to learn the bounds of the probabilities of causation for subpopulations given finite population data.
We show by a simulated study that the machine learning model is able to learn the bounds of PNS for 32768 subpopulations with only knowing roughly 500 of them from the finite population data.
arXiv Detail & Related papers (2022-10-16T05:46:25Z) - Bounding Counterfactuals under Selection Bias [60.55840896782637]
We propose a first algorithm to address both identifiable and unidentifiable queries.
We prove that, in spite of the missingness induced by the selection bias, the likelihood of the available data is unimodal.
arXiv Detail & Related papers (2022-07-26T10:33:10Z) - Nested Counterfactual Identification from Arbitrary Surrogate
Experiments [95.48089725859298]
We study the identification of nested counterfactuals from an arbitrary combination of observations and experiments.
Specifically, we prove the counterfactual unnesting theorem (CUT), which allows one to map arbitrary nested counterfactuals to unnested ones.
arXiv Detail & Related papers (2021-07-07T12:51:04Z) - BayesIMP: Uncertainty Quantification for Causal Data Fusion [52.184885680729224]
We study the causal data fusion problem, where datasets pertaining to multiple causal graphs are combined to estimate the average treatment effect of a target variable.
We introduce a framework which combines ideas from probabilistic integration and kernel mean embeddings to represent interventional distributions in the reproducing kernel Hilbert space.
arXiv Detail & Related papers (2021-06-07T10:14:18Z) - Balancing Biases and Preserving Privacy on Balanced Faces in the Wild [50.915684171879036]
There are demographic biases present in current facial recognition (FR) models.
We introduce our Balanced Faces in the Wild dataset to measure these biases across different ethnic and gender subgroups.
We find that relying on a single score threshold to differentiate between genuine and imposters sample pairs leads to suboptimal results.
We propose a novel domain adaptation learning scheme that uses facial features extracted from state-of-the-art neural networks.
arXiv Detail & Related papers (2021-03-16T15:05:49Z) - Identification Methods With Arbitrary Interventional Distributions as
Inputs [8.185725740857595]
Causal inference quantifies cause-effect relationships by estimating counterfactual parameters from data.
We use Single World Intervention Graphs and a nested factorization of models associated with mixed graphs to give a very simple view of existing identification theory for experimental data.
arXiv Detail & Related papers (2020-04-02T17:27:18Z) - Efficient Discovery of Heterogeneous Quantile Treatment Effects in
Randomized Experiments via Anomalous Pattern Detection [1.9346186297861747]
Treatment Effect Subset Scan (TESS) is a new method for discovering which subpopulation in a randomized experiment is most significantly affected by a treatment.
In addition to the algorithm, we demonstrate that under the sharp null hypothesis of no treatment effect, the Type I and II error can be controlled.
arXiv Detail & Related papers (2018-03-24T20:21:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.