Synthetic Potential Outcomes and Causal Mixture Identifiability
- URL: http://arxiv.org/abs/2405.19225v4
- Date: Fri, 13 Dec 2024 06:58:47 GMT
- Title: Synthetic Potential Outcomes and Causal Mixture Identifiability
- Authors: Bijan Mazaheri, Chandler Squires, Caroline Uhler,
- Abstract summary: Heterogeneity can be resolved at multiple levels by grouping populations according to different notions of similarity.
This paper proposes grouping with respect to the causal response of an intervention or perturbation on the system.
- Score: 9.649642656207869
- License:
- Abstract: Heterogeneous data from multiple populations, sub-groups, or sources is often represented as a ``mixture model'' with a single latent class influencing all of the observed covariates. Heterogeneity can be resolved at multiple levels by grouping populations according to different notions of similarity. This paper proposes grouping with respect to the causal response of an intervention or perturbation on the system. This definition is distinct from previous notions, such as similar covariate values (e.g. clustering) or similar correlations between covariates (e.g. Gaussian mixture models). To solve the problem, we ``synthetically sample'' from a counterfactual distribution using higher-order multi-linear moments of the observable data. To understand how these ``causal mixtures'' fit in with more classical notions, we develop a hierarchy of mixture identifiability.
Related papers
- Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Generation is better than Modification: Combating High Class Homophily Variance in Graph Anomaly Detection [51.11833609431406]
Homophily distribution differences between different classes are significantly greater than those in homophilic and heterophilic graphs.
We introduce a new metric called Class Homophily Variance, which quantitatively describes this phenomenon.
To mitigate its impact, we propose a novel GNN model named Homophily Edge Generation Graph Neural Network (HedGe)
arXiv Detail & Related papers (2024-03-15T14:26:53Z) - C-Mixup: Improving Generalization in Regression [71.10418219781575]
Mixup algorithm improves generalization by linearly interpolating a pair of examples and their corresponding labels.
We propose C-Mixup, which adjusts the sampling probability based on the similarity of the labels.
C-Mixup achieves 6.56%, 4.76%, 5.82% improvements in in-distribution generalization, task generalization, and out-of-distribution robustness, respectively.
arXiv Detail & Related papers (2022-10-11T20:39:38Z) - Joint Probability Estimation Using Tensor Decomposition and Dictionaries [3.4720326275851994]
We study non-parametric estimation of joint probabilities of a given set of discrete and continuous random variables from their (empirically estimated) 2D marginals.
We create a dictionary of various families of distributions by inspecting the data, and use it to approximate each decomposed factor of the product in the mixture.
arXiv Detail & Related papers (2022-03-03T11:55:51Z) - A Robust and Flexible EM Algorithm for Mixtures of Elliptical
Distributions with Missing Data [71.9573352891936]
This paper tackles the problem of missing data imputation for noisy and non-Gaussian data.
A new EM algorithm is investigated for mixtures of elliptical distributions with the property of handling potential missing data.
Experimental results on synthetic data demonstrate that the proposed algorithm is robust to outliers and can be used with non-Gaussian data.
arXiv Detail & Related papers (2022-01-28T10:01:37Z) - Mixture Representation Learning with Coupled Autoencoders [1.589915930948668]
We propose an unsupervised variational framework using multiple interacting networks called cpl-mixVAE.
In this framework, the mixture representation of each network is regularized by imposing a consensus constraint on the discrete factor.
We use the proposed method to jointly uncover discrete and continuous factors of variability describing gene expression in a single-cell transcriptomic dataset.
arXiv Detail & Related papers (2020-07-20T04:12:04Z) - Mixture of Conditional Gaussian Graphical Models for unlabelled
heterogeneous populations in the presence of co-factors [0.0]
Conditional correlation networks, within Gaussian Graphical Models (GGM), are widely used to describe the direct interactions between the components of a random vector.
In this article, we propose a Mixture of Conditional GGM (CGGM) that subtracts the heterogeneous effects of the co-features to regroup the data points into sub-population corresponding clusters.
arXiv Detail & Related papers (2020-06-19T11:57:30Z) - Consistent Estimation of Identifiable Nonparametric Mixture Models from
Grouped Observations [84.81435917024983]
This work proposes an algorithm that consistently estimates any identifiable mixture model from grouped observations.
A practical implementation is provided for paired observations, and the approach is shown to outperform existing methods.
arXiv Detail & Related papers (2020-06-12T20:44:22Z) - Learning from Aggregate Observations [82.44304647051243]
We study the problem of learning from aggregate observations where supervision signals are given to sets of instances.
We present a general probabilistic framework that accommodates a variety of aggregate observations.
Simple maximum likelihood solutions can be applied to various differentiable models.
arXiv Detail & Related papers (2020-04-14T06:18:50Z) - Causal Structure Discovery from Distributions Arising from Mixtures of
DAGs [12.12755951035594]
We consider distributions arising from a mixture of causal models, where each model is represented by a directed acyclic graph (DAG)
We show that such algorithms recover a "union" of the component DAGs and can identify variables whose conditional distribution across the component DAGs vary.
As an immediate application, we demonstrate how retrieval of this causal information can be used to cluster samples according to each mixture component.
arXiv Detail & Related papers (2020-01-31T16:33:26Z) - Blocked Clusterwise Regression [0.0]
We generalize previous approaches to discrete unobserved heterogeneity by allowing each unit to have multiple latent variables.
We contribute to the theory of clustering with an over-specified number of clusters and derive new convergence rates for this setting.
arXiv Detail & Related papers (2020-01-29T23:29:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.