Synthetic Interventions
- URL: http://arxiv.org/abs/2006.07691v6
- Date: Tue, 31 Oct 2023 16:39:22 GMT
- Title: Synthetic Interventions
- Authors: Anish Agarwal, Devavrat Shah, Dennis Shen
- Abstract summary: We learn the expected potential outcome associated with every intervention on every unit, totaling $N times D$ causal parameters.
We present a causal framework, synthetic interventions (SI), to infer these $N times D$ causal parameters.
We believe our results could have implications for the design of data-efficient randomized experiments.
- Score: 20.96904429337912
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Consider a setting with $N$ heterogeneous units (e.g., individuals,
sub-populations) and $D$ interventions (e.g., socio-economic policies). Our
goal is to learn the expected potential outcome associated with every
intervention on every unit, totaling $N \times D$ causal parameters. Towards
this, we present a causal framework, synthetic interventions (SI), to infer
these $N \times D$ causal parameters while only observing each of the $N$ units
under at most two interventions, independent of $D$. This can be significant as
the number of interventions, i.e., level of personalization, grows. Under a
novel tensor factor model across units, outcomes, and interventions, we prove
an identification result for each of these $N \times D$ causal parameters,
establish finite-sample consistency of our estimator along with asymptotic
normality under additional conditions. Importantly, our estimator also allows
for latent confounders that determine how interventions are assigned. The
estimator is further furnished with data-driven tests to examine its
suitability. Empirically, we validate our framework through a large-scale A/B
test performed on an e-commerce platform. We believe our results could have
implications for the design of data-efficient randomized experiments (e.g.,
randomized control trials) with heterogeneous units and multiple interventions.
Related papers
- FACT: Foundation Model for Assessing Cancer Tissue Margins with Mass Spectrometry [1.0183055506531902]
FACT is an adaptation of a foundation model originally designed for text-audio association, pretrained using our proposed supervised contrastive approach based on triplet loss.
Results: Our proposed model significantly improves the classification performance, achieving state-of-the-art performance with an AUROC of $82.4% pm 0.8$.
arXiv Detail & Related papers (2025-04-15T16:36:03Z) - Learning Identifiable Structures Helps Avoid Bias in DNN-based Supervised Causal Learning [56.22841701016295]
Supervised Causal Learning (SCL) is an emerging paradigm in this field.
Existing Deep Neural Network (DNN)-based methods commonly adopt the "Node-Edge approach"
arXiv Detail & Related papers (2025-02-15T19:10:35Z) - A Robust Support Vector Machine Approach for Raman COVID-19 Data Classification [0.7864304771129751]
In this paper, we investigate the performance of a novel robust formulation for Support Vector Machine (SVM) in classifying COVID-19 samples obtained from Raman spectroscopy.
We derive robust counterpart models of deterministic formulations using bounded-by-norm uncertainty sets around each observation.
The effectiveness of our approach is validated on real-world COVID-19 datasets provided by Italian hospitals.
arXiv Detail & Related papers (2025-01-29T14:02:45Z) - Standing on the shoulders of giants [0.0]
Item Response Theory (IRT) allows an assessment at the level of latent characteristics of instances.
IRT does not replace, but complements classical metrics by offering a new layer of evaluation and observation of the fine behavior of models in specific instances.
arXiv Detail & Related papers (2024-09-05T00:58:07Z) - GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models.
GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies.
We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z) - Latent Semantic Consensus For Deterministic Geometric Model Fitting [109.44565542031384]
We propose an effective method called Latent Semantic Consensus (LSC)
LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses.
LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting.
arXiv Detail & Related papers (2024-03-11T05:35:38Z) - A Bayesian Methodology for Estimation for Sparse Canonical Correlation [0.0]
Canonical Correlation Analysis (CCA) is a statistical procedure for identifying relationships between data sets.
ScSCCA is a rapidly emerging methodological area that aims for robust modeling of the interrelations between the different data modalities.
We propose a novel ScSCCA approach where we employ a Bayesian infinite factor model and aim to achieve robust estimation.
arXiv Detail & Related papers (2023-10-30T15:14:25Z) - Hyperspectral Benchmark: Bridging the Gap between HSI Applications
through Comprehensive Dataset and Pretraining [11.935879491267634]
Hyperspectral Imaging (HSI) serves as a non-destructive spatial spectroscopy technique with a multitude of potential applications.
A recurring challenge lies in the limited size of the target datasets, impeding exhaustive architecture search.
This study introduces an innovative benchmark dataset encompassing three markedly distinct HSI applications.
arXiv Detail & Related papers (2023-09-20T08:08:34Z) - Less is More: Mitigate Spurious Correlations for Open-Domain Dialogue
Response Generation Models by Causal Discovery [52.95935278819512]
We conduct the first study on spurious correlations for open-domain response generation models based on a corpus CGDIALOG curated in our work.
Inspired by causal discovery algorithms, we propose a novel model-agnostic method for training and inference of response generation model.
arXiv Detail & Related papers (2023-03-02T06:33:48Z) - A Statistical Learning Take on the Concordance Index for Survival
Analysis [0.29005223064604074]
We provide C-index Fisher-consistency results and excess risk bounds for several commonly used cost functions in survival analysis.
We also study the general case where no model assumption is made and present a new, off-the-shelf method that is shown to be consistent with the C-index.
arXiv Detail & Related papers (2023-02-23T14:33:54Z) - SMT-Based Safety Verification of Data-Aware Processes under Ontologies
(Extended Version) [71.12474112166767]
We introduce a variant of one of the most investigated models in this spectrum, namely simple artifact systems (SASs)
This DL, enjoying suitable model-theoretic properties, allows us to define SASs to which backward reachability can still be applied, leading to decidability in PSPACE of the corresponding safety problems.
arXiv Detail & Related papers (2021-08-27T15:04:11Z) - Model-Based Counterfactual Synthesizer for Interpretation [40.01787107375103]
We propose a Model-based Counterfactual Synthesizer (MCS) framework for interpreting machine learning models.
We first analyze the model-based counterfactual process and construct a base synthesizer using a conditional generative adversarial net (CGAN)
To better approximate the counterfactual universe for those rare queries, we novelly employ the umbrella sampling technique to conduct the MCS framework training.
arXiv Detail & Related papers (2021-06-16T17:09:57Z) - A comprehensive comparative evaluation and analysis of Distributional
Semantic Models [61.41800660636555]
We perform a comprehensive evaluation of type distributional vectors, either produced by static DSMs or obtained by averaging the contextualized vectors generated by BERT.
The results show that the alleged superiority of predict based models is more apparent than real, and surely not ubiquitous.
We borrow from cognitive neuroscience the methodology of Representational Similarity Analysis (RSA) to inspect the semantic spaces generated by distributional models.
arXiv Detail & Related papers (2021-05-20T15:18:06Z) - Leveraging Global Parameters for Flow-based Neural Posterior Estimation [90.21090932619695]
Inferring the parameters of a model based on experimental observations is central to the scientific method.
A particularly challenging setting is when the model is strongly indeterminate, i.e., when distinct sets of parameters yield identical observations.
We present a method for cracking such indeterminacy by exploiting additional information conveyed by an auxiliary set of observations sharing global parameters.
arXiv Detail & Related papers (2021-02-12T12:23:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.