Exploratory Causal Inference in SAEnce
- URL: http://arxiv.org/abs/2510.14073v1
- Date: Wed, 15 Oct 2025 20:30:54 GMT
- Title: Exploratory Causal Inference in SAEnce
- Authors: Tommaso Mencattini, Riccardo Cadei, Francesco Locatello,
- Abstract summary: We propose to discover the unknown effects of a treatment directly from data.<n>For this, we turn unstructured data from a trial into meaningful representations via pretrained foundation models and interpret them via a sparse autoencoder.<n>Discovering significant causal effects at the neural level is not trivial due to multiple-testing issues and effects entanglement.
- Score: 25.91637307089553
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Randomized Controlled Trials are one of the pillars of science; nevertheless, they rely on hand-crafted hypotheses and expensive analysis. Such constraints prevent causal effect estimation at scale, potentially anchoring on popular yet incomplete hypotheses. We propose to discover the unknown effects of a treatment directly from data. For this, we turn unstructured data from a trial into meaningful representations via pretrained foundation models and interpret them via a sparse autoencoder. However, discovering significant causal effects at the neural level is not trivial due to multiple-testing issues and effects entanglement. To address these challenges, we introduce Neural Effect Search, a novel recursive procedure solving both issues by progressive stratification. After assessing the robustness of our algorithm on semi-synthetic experiments, we showcase, in the context of experimental ecology, the first successful unsupervised causal effect identification on a real-world scientific trial.
Related papers
- Cross-Validated Causal Inference: a Modern Method to Combine Experimental and Observational Data [48.72384067821617]
We develop new methods to integrate experimental and observational data in causal inference.<n>A full model containing the causal parameter is obtained by minimizing a weighted combination of experimental and observational losses.<n>Experiments on real and synthetic data show the efficacy and reliability of our method.
arXiv Detail & Related papers (2025-11-01T22:24:16Z) - Do-PFN: In-Context Learning for Causal Effect Estimation [75.62771416172109]
We show that Prior-data fitted networks (PFNs) can be pre-trained on synthetic data to predict outcomes.<n>Our approach allows for the accurate estimation of causal effects without knowledge of the underlying causal graph.
arXiv Detail & Related papers (2025-06-06T12:43:57Z) - Data Fusion for Partial Identification of Causal Effects [62.56890808004615]
We propose a novel partial identification framework that enables researchers to answer key questions.<n>Is the causal effect positive or negative? and How severe must assumption violations be to overturn this conclusion?<n>We apply our framework to the Project STAR study, which investigates the effect of classroom size on students' third-grade standardized test performance.
arXiv Detail & Related papers (2025-05-30T07:13:01Z) - Heterogeneous Causal Discovery of Repeated Undesirable Health Outcomes [8.16644941863291]
Causal discovery offers an alternative to conventional approaches by generating cause-and-effect hypotheses from observational data.<n>It often relies on strong or untestable assumptions, which can limit its practical application.<n>This work aims to make causal discovery more practical by considering multiple assumptions and identifying heterogeneous effects.
arXiv Detail & Related papers (2025-03-14T15:05:17Z) - Causal Lifting of Neural Representations: Zero-Shot Generalization for Causal Inferences [56.23412698865433]
We focus on Prediction-Powered Causal Inferences (PPCI)<n> PPCI estimates the treatment effect in a target experiment with unlabeled factual outcomes, retrievable zero-shot from a pre-trained model.<n>We validate our method on synthetic and real-world scientific data, offering solutions to instances not solvable by vanilla Empirical Risk Minimization.
arXiv Detail & Related papers (2025-02-10T10:52:17Z) - Fast Proxy Experiment Design for Causal Effect Identification [27.885243535456237]
Two approaches to estimate causal effects are observational and experimental (randomized) studies.
Direct experiments on the target variable may be too costly or even infeasible to conduct.
A proxy experiment is conducted on variables with a lower cost to intervene on compared to the main target.
arXiv Detail & Related papers (2024-07-07T11:09:38Z) - Smoke and Mirrors in Causal Downstream Tasks [59.90654397037007]
This paper looks at the causal inference task of treatment effect estimation, where the outcome of interest is recorded in high-dimensional observations.<n>We compare 6 480 models fine-tuned from state-of-the-art visual backbones, and find that the sampling and modeling choices significantly affect the accuracy of the causal estimate.<n>Our results suggest that future benchmarks should carefully consider real downstream scientific questions, especially causal ones.
arXiv Detail & Related papers (2024-05-27T13:26:34Z) - Identification of Single-Treatment Effects in Factorial Experiments [0.0]
I show that when multiple interventions are randomized in experiments, the effect any single intervention would have outside the experimental setting is not identified absent heroic assumptions.
observational studies and factorial experiments provide information about potential-outcome distributions with zero and multiple interventions.
I show that researchers who rely on this type of design have to justify either linearity of functional forms or specify with Directed Acyclic Graphs how variables are related in the real world.
arXiv Detail & Related papers (2024-05-16T04:01:53Z) - A Second Look at the Impact of Passive Voice Requirements on Domain
Modeling: Bayesian Reanalysis of an Experiment [4.649794383775257]
We reanalyze the only known controlled experiment investigating the impact of passive voice on the subsequent activity of domain modeling.
Our results reveal that the effects observed by the original authors turned out to be much less significant than previously assumed.
arXiv Detail & Related papers (2024-02-16T16:24:00Z) - BaCaDI: Bayesian Causal Discovery with Unknown Interventions [118.93754590721173]
BaCaDI operates in the continuous space of latent probabilistic representations of both causal structures and interventions.
In experiments on synthetic causal discovery tasks and simulated gene-expression data, BaCaDI outperforms related methods in identifying causal structures and intervention targets.
arXiv Detail & Related papers (2022-06-03T16:25:48Z) - Efficient Causal Inference from Combined Observational and
Interventional Data through Causal Reductions [68.6505592770171]
Unobserved confounding is one of the main challenges when estimating causal effects.
We propose a novel causal reduction method that replaces an arbitrary number of possibly high-dimensional latent confounders.
We propose a learning algorithm to estimate the parameterized reduced model jointly from observational and interventional data.
arXiv Detail & Related papers (2021-03-08T14:29:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.