CausalProfiler: Generating Synthetic Benchmarks for Rigorous and Transparent Evaluation of Causal Machine Learning
- URL: http://arxiv.org/abs/2511.22842v1
- Date: Fri, 28 Nov 2025 02:21:17 GMT
- Title: CausalProfiler: Generating Synthetic Benchmarks for Rigorous and Transparent Evaluation of Causal Machine Learning
- Authors: Panayiotis Panayiotou, Audrey Poinsot, Alessandro Leite, Nicolas Chesneau, Marc Schoenauer, Özgür Şimşek,
- Abstract summary: Causal machine learning (Causal ML) aims to answer "what if" questions using machine learning algorithms.<n>Existing benchmarks often rely on a handful of hand-crafted or semi-synthetic datasets, leading to brittle, non-generalizable conclusions.<n>We introduce CausalProfiler, a synthetic benchmark generator for Causal ML methods.
- Score: 37.628115292905214
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Causal machine learning (Causal ML) aims to answer "what if" questions using machine learning algorithms, making it a promising tool for high-stakes decision-making. Yet, empirical evaluation practices in Causal ML remain limited. Existing benchmarks often rely on a handful of hand-crafted or semi-synthetic datasets, leading to brittle, non-generalizable conclusions. To bridge this gap, we introduce CausalProfiler, a synthetic benchmark generator for Causal ML methods. Based on a set of explicit design choices about the class of causal models, queries, and data considered, the CausalProfiler randomly samples causal models, data, queries, and ground truths constituting the synthetic causal benchmarks. In this way, Causal ML methods can be rigorously and transparently evaluated under a variety of conditions. This work offers the first random generator of synthetic causal benchmarks with coverage guarantees and transparent assumptions operating on the three levels of causal reasoning: observation, intervention, and counterfactual. We demonstrate its utility by evaluating several state-of-the-art methods under diverse conditions and assumptions, both in and out of the identification regime, illustrating the types of analyses and insights the CausalProfiler enables.
Related papers
- Can Causality Cure Confusion Caused By Correlation (in Software Analytics)? [4.082216579462797]
Symbolic models, particularly decision trees, are widely used in software engineering for explainable analytics.<n>Recent studies in software engineering show that both correlational models and causal discovery algorithms suffer from pronounced instability.<n>This study investigates causality-aware split criteria into symbolic models to improve their stability and robustness.
arXiv Detail & Related papers (2026-02-17T23:35:50Z) - Learning to Reason in LLMs by Expectation Maximization [55.721496945401846]
We formalize reasoning as a latent variable model and derive an expectation-maximization objective for learning to reason.<n>This view connects EM and modern reward-based optimization, and shows that the main challenge lies in designing a sampling distribution that generates rationales that justify correct answers.
arXiv Detail & Related papers (2025-12-23T08:56:49Z) - Model Correlation Detection via Random Selection Probing [62.093777777813756]
Existing similarity-based methods require access to model parameters or produce scores without thresholds.<n>We introduce Random Selection Probing (RSP), a hypothesis-testing framework that formulates model correlation detection as a statistical test.<n>RSP produces rigorous p-values that quantify evidence of correlation.
arXiv Detail & Related papers (2025-09-29T01:40:26Z) - Ice Cream Doesn't Cause Drowning: Benchmarking LLMs Against Statistical Pitfalls in Causal Inference [16.706959860667133]
It remains unclear whether large language models (LLMs) can handle rigorous and trustworthy statistical causal inference.<n>The CausalPitfalls benchmark provides essential guidance and quantitative metrics to advance the development of trustworthy causal reasoning systems.
arXiv Detail & Related papers (2025-05-19T23:06:00Z) - Language Models as Causal Effect Generators [48.696932388555894]
We present sequence-driven structural causal models (SD-SCMs)<n>An SD-SCM enables sampling from observational, interventional, and counterfactual distributions according to the desired causal structure.<n>We propose a new type of benchmark for causal inference methods, generating individual-level counterfactual data to test treatment effect estimation.
arXiv Detail & Related papers (2024-11-12T18:50:35Z) - Evaluating Human Alignment and Model Faithfulness of LLM Rationale [66.75309523854476]
We study how well large language models (LLMs) explain their generations through rationales.
We show that prompting-based methods are less "faithful" than attribution-based explanations.
arXiv Detail & Related papers (2024-06-28T20:06:30Z) - Effective Bayesian Causal Inference via Structural Marginalisation and Autoregressive Orders [16.682775063684907]
We study the use of uncertainty in causal inference over all causal models.<n>We decompose structure marginalisation into the marginalisation over (i) causal orders and (ii) directed acyclic graphs (DAGs) given an order.<n>Our method outperforms state-of-the-art in structure learning on simulated non-linear additive noise benchmarks.
arXiv Detail & Related papers (2024-02-22T18:39:24Z) - Partially Specified Causal Simulations [0.0]
Many causal inference literature tend to design over-restricted or misspecified studies.
We introduce partially randomized causal simulation (PARCS), a simulation framework that meets those desiderata.
We reproduce and extend the simulation studies of two well-known causal discovery and missing data analysis papers.
arXiv Detail & Related papers (2023-09-19T10:50:35Z) - Explainability in Process Outcome Prediction: Guidelines to Obtain
Interpretable and Faithful Models [77.34726150561087]
We define explainability through the interpretability of the explanations and the faithfulness of the explainability model in the field of process outcome prediction.
This paper contributes a set of guidelines named X-MOP which allows selecting the appropriate model based on the event log specifications.
arXiv Detail & Related papers (2022-03-30T05:59:50Z) - Evaluating Causal Inference Methods [0.4588028371034407]
We introduce a deep generative model-based framework, Credence, to validate causal inference methods.
Our work introduces a deep generative model-based framework, Credence, to validate causal inference methods.
arXiv Detail & Related papers (2022-02-09T00:21:22Z) - Estimation of Bivariate Structural Causal Models by Variational Gaussian
Process Regression Under Likelihoods Parametrised by Normalising Flows [74.85071867225533]
Causal mechanisms can be described by structural causal models.
One major drawback of state-of-the-art artificial intelligence is its lack of explainability.
arXiv Detail & Related papers (2021-09-06T14:52:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.