Estimating Causal Effects in Gaussian Linear SCMs with Finite Data
- URL: http://arxiv.org/abs/2601.04673v1
- Date: Thu, 08 Jan 2026 07:37:10 GMT
- Title: Estimating Causal Effects in Gaussian Linear SCMs with Finite Data
- Authors: Aurghya Maiti, Prateek Jain,
- Abstract summary: Estimating causal effects from observational data remains a fundamental challenge in causal inference.<n>This paper focuses on estimating causal effects in Gaussian Linear Structural Causal Models (GL-SCMs)<n>We present a novel EM-based estimation that can learn identifiable causal effects parameters and estimate causal effects from finite observational samples.
- Score: 14.222953715948272
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Estimating causal effects from observational data remains a fundamental challenge in causal inference, especially in the presence of latent confounders. This paper focuses on estimating causal effects in Gaussian Linear Structural Causal Models (GL-SCMs), which are widely used due to their analytical tractability. However, parameter estimation in GL-SCMs is often infeasible with finite data, primarily due to overparameterization. To address this, we introduce the class of Centralized Gaussian Linear SCMs (CGL-SCMs), a simplified yet expressive subclass where exogenous variables follow standardized distributions. We show that CGL-SCMs are equally expressive in terms of causal effect identifiability from observational distributions and present a novel EM-based estimation algorithm that can learn CGL-SCM parameters and estimate identifiable causal effects from finite observational samples. Our theoretical analysis is validated through experiments on synthetic data and benchmark causal graphs, demonstrating that the learned models accurately recover causal distributions.
Related papers
- Linear-LLM-SCM: Benchmarking LLMs for Coefficient Elicitation in Linear-Gaussian Causal Models [28.281361951823765]
We introduce Linear-LLM-SCM, a plug-and-play benchmarking framework for evaluating large language models (LLMs)<n>We show challenges in such benchmarking tasks, namely, strongity in the results in some of the models and susceptibility to DAG misspecification via spurious edges in the continuous domains.<n>We also open-source the benchmarking framework so that researchers can utilize their DAGs and any off-the-shelf LLMs plug-and-play for evaluation in their domains effortlessly.
arXiv Detail & Related papers (2026-02-10T20:49:01Z) - Causal Discovery for Linear DAGs with Dependent Latent Variables via Higher-order Cumulants [7.808674222118538]
Existing methods assume mutually independent latent confounders or cannot properly handle models with causal relationships among observed variables.<n>We propose a novel algorithm that identifies causal DAGs in LvLiNGAM, allowing causal structures among latent variables, among observed variables, and between the two.
arXiv Detail & Related papers (2025-10-16T15:15:20Z) - Hallucination Detection in LLMs with Topological Divergence on Attention Graphs [60.83579255387347]
Hallucination, i.e., generating factually incorrect content, remains a critical challenge for large language models.<n>We introduce TOHA, a TOpology-based HAllucination detector in the RAG setting.
arXiv Detail & Related papers (2025-04-14T10:06:27Z) - Bayesian Causal Inference with Gaussian Process Networks [1.7188280334580197]
We consider the problem of the Bayesian estimation of the effects of hypothetical interventions in the Gaussian Process Network model.
We detail how to perform causal inference on GPNs by simulating the effect of an intervention across the whole network and propagating the effect of the intervention on downstream variables.
We extend both frameworks beyond the case of a known causal graph, incorporating uncertainty about the causal structure via Markov chain Monte Carlo methods.
arXiv Detail & Related papers (2024-02-01T14:39:59Z) - TSLiNGAM: DirectLiNGAM under heavy tails [0.0]
We propose TSLiNGAM, a new method for identifying the DAG of a causal model based on observational data.
TSLiNGAM builds on DirectLiNGAM, a popular algorithm which uses simple OLS regression for identifying causal directions between variables.
It performs significantly better on heavy-tailed and skewed data and demonstrates a high small-sample efficiency.
arXiv Detail & Related papers (2023-08-10T08:34:46Z) - Learning Latent Structural Causal Models [31.686049664958457]
In machine learning tasks, one often operates on low-level data like image pixels or high-dimensional vectors.
We present a tractable approximate inference method which performs joint inference over the causal variables, structure and parameters of the latent Structural Causal Model.
arXiv Detail & Related papers (2022-10-24T20:09:44Z) - MissDAG: Causal Discovery in the Presence of Missing Data with
Continuous Additive Noise Models [78.72682320019737]
We develop a general method, which we call MissDAG, to perform causal discovery from data with incomplete observations.
MissDAG maximizes the expected likelihood of the visible part of observations under the expectation-maximization framework.
We demonstrate the flexibility of MissDAG for incorporating various causal discovery algorithms and its efficacy through extensive simulations and real data experiments.
arXiv Detail & Related papers (2022-05-27T09:59:46Z) - BCDAG: An R package for Bayesian structure and Causal learning of
Gaussian DAGs [77.34726150561087]
We introduce the R package for causal discovery and causal effect estimation from observational data.
Our implementation scales efficiently with the number of observations and, whenever the DAGs are sufficiently sparse, the number of variables in the dataset.
We then illustrate the main functions and algorithms on both real and simulated datasets.
arXiv Detail & Related papers (2022-01-28T09:30:32Z) - BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery [97.79015388276483]
A structural equation model (SEM) is an effective framework to reason over causal relationships represented via a directed acyclic graph (DAG)
Recent advances enabled effective maximum-likelihood point estimation of DAGs from observational data.
We propose BCD Nets, a variational framework for estimating a distribution over DAGs characterizing a linear-Gaussian SEM.
arXiv Detail & Related papers (2021-12-06T03:35:21Z) - Partial Counterfactual Identification from Observational and
Experimental Data [83.798237968683]
We develop effective Monte Carlo algorithms to approximate the optimal bounds from an arbitrary combination of observational and experimental data.
Our algorithms are validated extensively on synthetic and real-world datasets.
arXiv Detail & Related papers (2021-10-12T02:21:30Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.